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Preface 


The outlook of this second edition is the same as that of the original: to present linear 
algebra as the theory and practice of linear spaces and linear mappings. Where it aids 
understanding and calculations, | don't hesitate to describe vectors as arrays of 
numbers and to describe mappings as matrices. Render onto Caesar the things which 
are Caesar's. 

If you can reduce a mathematical problem to a problem in linear algebra, you can 
most likely solve it, provided that you know enough linear algebra. Therefore, a 
thorough grounding in linear algebra is highly desirable. A sound undergraduate 
education should offer a second course on the subject, at the senior level. I wrote this 
book as a suitable text for such a course. The changes made in this second edition are 
partly to make it more suitable as a text. Terse descriptions, especially in the early 
chapters, were expanded, more problems were added, and a list of solutions to 
selected problems has been provided. 

In addition, quite a bit of new material has been added, such as the compactness 
of the unit ball as a criterion of finite dimensionality of a normed linear space. A new 
chapter discusses the QR algorithm for finding the eigenvalues of a self-adjoint 
matrix. The Householder algorithm for turning such matrices into tridiagonal form is 
presented. | describe in some detail the beautiful observation of Deift, Nanda, and 
Tomei of the analogy between the convergence of the QR algorithm and Moser's 
theorem on the asymptotic behavior of the Toda flow as time tends to infinity. 

Eight new appendices have been added to the first edition's original eight, 
including the Fast Fourier Transform, the spectral radius theorem, proved with the 
help of the Schur factorization of matrices, and an excursion into the theory of 
matrix-valued analytic functions. Appendix 11 describes the Lorentz group, 12 is an 
interesting application of the compactness criterion for finite dimensionality, 13 is a 
characterization of commutators, 14 presents a proof of Liapunov's stability 
criterion, 15 presents the construction of the Jordan Canonical form of matrices, and 
16 describes Carl Pearcy's elegant proof of Halmos' conjecture about the numerical 
range of matrices. 

I conclude with a plea to include the simplest aspects of linear algebra in high- 
school teaching: vectors with two and three components, the scalar product, the 
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cross product, the description of rotations by matrices, and applications to geometry. 
Such modernization of the high-school curriculum is long overdue. 

| acknowledge with pleasure much help I have received from Ray Michalek, as 
well as useful conversations with Albert Novikoff and Charlie Peskin. I also would 
like to thank Roger Horn, Beresford Parlett, and Jerry Kazdan for very useful 
comments, and Jeffrev Ryan for help in proofreading. 


PETER D. LAX 


New York, New York 


Preface to the First Edition 


This book is based on a lecture course designed for entering graduate students and 
given over a number of years at the Courant Institute of New York University. The 
course is open also to qualified undergraduates and on occasion was attended by 
talented high school students, among them Alan Edelman; I am proud to have been 
the first to teach him linear algebra. But, apart from special cases, the book, like the 
course, is for an audience that has some—not much—familiarity with linear algebra. 

Fifty vears ago, linear algebra was on its way out as a subject for research. Yet 
during the past five decades there has been an unprecedented outburst of new ideas 
about how to solve linear equations, carry out least square procedures, tackle 
systems of linear inequalities, and find eigenvalues of matrices. This outburst came 
in response to the opportunity created by the availability of ever faster computers 
with ever larger memories. Thus, linear algebra was thrust center stage in numerical 
mathematics. This had a profound effect, partly good, partly bad, on how the subject 
Is taught todav. 

The presentation of new numerical methods brought fresh and exciting material, 
as well as realistic new applications, to the classroom. Many students, after all, are in 
a linear algebra class only for the applications. On the other hand, bringing 
applications and algorithms to the foreground has obscured the structure of linear 
algebra—a trend I deplore; it does students a great disservice to exclude them from 
the paradise created by Emmy Noether and Emil Artin. One of the aims of this book 
is to redress this imbalance. 

My second aim in writing this book ts to present a rich selection of analytical 
results and some of their applications: matrix inequalities, estimates for eigenvalues 
and determinants, and so on. This beautiful aspect of linera algebra, so useful for 
working analysts and physicists, is often neglected in texts. 

| strove to choose proofs that are revealing, elegant, and short. When there are 
two different ways of viewing a problem, I like to present both. 

The Contents describes what is in the book. Here I would like to explain my 
choice of materials and their treatment. The first four chapters describe the abstract 
theory of linear spaces and linear transformations. In the proofs I avoid elimination 
of the unknowns one by one, but use the linear structure; I particularly. exploit 
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quotient spaces as a counting device. This dry material is enlivened by some 
nontrivial applications to quadrature, to interpolation by polynomials, and to solving 
the Dirichlet problem for the discretized Laplace equation. 

In Chapter 5, determinants are motivated geometrically as signed volumes of 
ordered simplices. The basic algebraic properties of determinants follow immediately. 

Chapter 6 is devoted to the spectral theory of arbitrary square matrices with 
complex entries. The completeness of eigenvectors and generalized eigenvectors is 
proved without the characteristic equation, relying only on the divisibility theory of 
the algebra of polynomials. In the same spirit we show that two matrices A and B are 
similar if and only if (A — KI)" and (B —&I)" have nullspaces of the same 
dimension for all complex & and all positive integer m. The proof of this proposition 
leads to the Jordan canonical form. 

Euclidean structure appears for the first time in Chapter 7. It is used in Chapter 8 
to derive the spectral theory of selfadjoint matrices. We present two proofs, one 
based on the spectral theory of general matrices, the other using the variational 
characterization of eigenvectors and eigenvalues. Fischer's minmax theorem is 
explained. 

Chapter 9 deals with the calculus of vector- and matrix-valued functions of a 
single variable, an important topic not usually discussed in the undergraduate 
curriculum. The most important result is the continuous and differentiable character 
of eigenvalues and normalized eigenvectors of differentiable matrix functions, 
provided that appropriate nondegeneracy conditions are satisfied. The fascinating 
phenomenon of "avoided crossings” 1s briefly described and explained. 

The first nine chapters, or certainly the first eight, constitute the core of linear algebra. 
The next eight chapters deal with special topics, to be taken up depending on the interest 
of the instructor and of the students. We shall comment on them very briefly. 

Chapter 10 is a symphony of inequalities about matrices, their eigenvalues, and 
their determinants. Many of the proofs make use of calculus. 

| included Chapter 11 to make up for the unfortunate disappearance of mechanics 
from the curriculum and to show how matrices give an elegant description of motion 
in space. Angular velocity of a rigid body and divergence and curl of a vector field all 
appear naturally. The monotonic dependence of eigenvalues of symmetric matrices 
is used to show that the natural frequencies of a vibrating system increase if the 
system is stiffened and the masses are decreased. 

Chapters 12, 13, and 14 are linked together by the notion of convexity. In Chapter 
12 we present the descriptions of convex sets in terms of gauge functions and support 
functions. The workhorse of the subject, the hyperplane separation theorem, is 
proved by means of the Hahn-Banach procedure. Carathéodorv's theorem on 
extreme points is proved and used to derive the Kónig-Birkhoff theorem on doubly 
stochastic matrices; Helly's theorem on the intersection of convex sets 1s stated and 
proved. 

Chapter 13 is on linear inequalities; the Farkas-Minkowski theorem is derived 
and used to prove the duality theorem, which then is applied in the usual fashion to a 
maximum-minimum problem in economics, and to the minmax theorem of von 
Neumann about two-person zero-sum games. 
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Chapter 14 is on normed linear spaces; it is mostly standard fare except for a dual 
characterization of the distance of a point from a linear subspace. Linear mappings 
of normed linear spaces are discussed in Chapter 15. 

Chapter 16 presents Perron's beautiful theorem on matrices all of whose entries 
are positive. The standard application to the asymptotics of Markov chains 1s 
described. In conclusion, the theorem of Frobenius about the eigenvalues of matrices 
with nonnegative entries is stated and proved. 

The last chapter discusses various strategies for solving iteratively systems of 
linear equations of the form Ax = b, A a self-adjoint, positive matrix. A variational 
formula is derived and a steepest descent method is analyzed. We go on to present 
several versions of iterations employing Chebyshev polynomials. Finally we 
describe the conjugate gradient method in terms of orthogonal polynomials. 

It is with genuine regret that I omit a chapter on the numerical calculation of 
eigenvalues of self-adjoint matrices. Astonishing connections have been discovered 
recently between this important subject and other seemingly unrelated topics. 

Eight appendices describe material that does not quite fit into the flow of the text, 
but that is so striking or so important that it is worth bringing to the attention of 
students. The topics I have chosen are special determinants that can be evaluated 
explicity, Pfaff's theorem, symplectic matrices, tensor product, lattices, Strassen's 
algorithm for fast matrix multiplication, Gershgorin’s theorem, and the multiplicity 
of eigenvalues. There are other equally attractive topics that could have been chosen: 
the Baker-Campbell-Hausdorff formula, the Kreiss matrix theorem, numerical 
range, and the inversion of tridiagonal matrices. 

Exercises are sprinkled throughout the text; a few of them are routine; most 
require some thinking and a few of them require some computing. 

My notation is neoclassical. I prefer to use four-letter Anglo-Saxon words like 
"into," "onto" and "'1-to- 1," rather than polysyllabic ones of Norman origin. The 
end of a proof is marked by an open square. 

The bibliography consists of the usual suspects and some recent texts; in addition, 
[ have included Courant-Hilbert, Volume I, unchanged from the original German 
version in 1924, Several generations of mathematicians and physicists, including the 
author, first learned linear algebra from Chapter | of this source. 

I am grateful to my colleagues at the Courant Institute and to Myron Allen at the 
University of Wyoming for reading and commenting on the manuscript and for 
trying out parts of it on their classes. [ am grateful to Connie Engle and Janice Want 
for their expert typing. 

| have learned a great deal from Richard Bellman’s outstanding book, 
Introduction to Matrix Analysis; its influence on the present volume is considerable. 
For this reason and to mark a friendship that began in 1945 and lasted until his death 
in 1984, I dedicate this book to his memory. 


PETER D. LAX 


New York, New York 


CHAPTER 1 


Fundamentals 


This first chapter aims to introduce the notion of an abstract linear space to those 
who think of vectors as arrays of components. I want to point out that the class of 
abstract linear spaces is no larger than the class of spaces whose elements are arrays. 
So what is gained by this abstraction? 

First of all, the freedom to use a single symbol for an array; this way we can think 
of vectors as basic building blocks, unencumbered by components. The abstract 
view leads to simple, transparent proofs of results. 

More to the point, the elements of many interesting vector spaces are not 
presented in terms of components. For instance, take a linear ordinary differential 
equation of degree n; the set of its solutions form a vector space of dimension n, yet 
they are not presented as arrays. 

Even if the elements of a vector space are presented as arrays of numbers, the 
elements of a subspace of it may not have a natural description as arrays. Take, for 
instance, the subspace of all vectors whose components add up to zero. 

Last but not least, the abstract view of vector spaces is indispensable for infinite- 
dimensional spaces; even though this text is strictly about finite-dimensional spaces, 
it is a good preparation for functional analysis. 


Linear algebra abstracts the two basic operations with vectors: the addition of 
vectors, and their multiplication by numbers (scalars). It is astonishing that on such 
slender foundations an elaborate structure can be built, with romanesque, gothic, and 
baroque aspects. It 1s even more astounding that linear algebra has not only the right 
theorems but also the right language for many mathematical topics, including 
applications of mathematics. 

A linear space X over a field K is a mathematical object in which two operations 
are defined: 

Addition, denoted by +, as in 


x+y (1) 
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and assumed to be commutative: 
x+y=ytz, (2) 
and associative: 
x+(y+c)=(x+y)+z, (3) 
and to form a group, with the neutral element denoted as 0: 
x4-0-x. (4) 
The inverse of addition is denoted by —: 


x+(-x) 8 xx 0. (5) 


EXERCISE I. Show that the zero of vector addition is unique. 


The second operation is multiplication of elements of X by elements k of the 


field K: 
kx. 


The result of this multiplication is a vector, that 1s, an element of X. 
Multiphcation by elements of K is assumed to be associative: 


k(ax) — (ka)x (6) 
and distributive: 
k(x + y) = kx + ky, (7) 
as well as 
(a + b)x = ax + bx. (8) 


We assume that multiplication by the unit of K, denoted as |, acts as the identity: 
lx z x. (9) 


These are the axioms of linear algebra. We proceed to draw some deductions: 
Set b = O in (8); it follows from Exercise | that for all x 


Ox = 0. (10) 
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Set a = 1,6 = — | in (8); using (9) and (10) we deduce that for all x 
(—i)x = —x. 


EXERCISE 2. Show that the vector with all components zero serves as the zero 
element of classical vector addition. 


In this analytically oriented text the field K will be either the field of real 
numbers or the field C of complex numbers. 

An interesting example of a linear space is the set of all functions x(1) that satisfy 
the differential equation 


^ 


é 
— KX +x=0. 
dt- 


The sum of two solutions is again a solution, and so is the constant multiple of one. 
This shows that the set of solutions of this differential equation form a linear space. 

Solutions of this equation describe the motion of a mass connected to a fixed 
point by a spring. Once the initial position x(0) = p and initial velocity e x(0) =y 
are given, the motion is completely determined for all t. So solutions can be 
described by a pair of numbers (p. v). 

The relation between the two descriptions is linear; that is. if (p. v) are the initial 
data of a solution x(t), and (q, w) the initial data of another solution v(r), then the 
initial data of the solution x(t) + y(t) are (p + q,v + w) = (p,v) + (q, w). Similarly, 
the initial data of the solution kx(1) are (kp, kv) = K(p. v). 

This kind of relation has been abstracted into the notion of isomorphism. 


Definition. A one-to-one correspondence between two linear spaces over the 
same field that maps sums into sums and scalar multiples into scalar multiples is 
called an isomorphism. 


Isomorphism ts a basic notion in linear algebra. [somorphic linear spaces are 
indistinguishable by means of operations available in linear spaces. Two linear 
spaces that are presented in very different ways can be, as we have seen, isomorphic. 


Examples of Linear Spaces. (i) Set of all row vectors: (aj....,a,).a; in K; 
addition, multiplication defined componentwise. This space is denoted as K". 

(ii) Set of all real-valued functions f(x) defined on the real line, K = R. 

(iii) Set of all functions with values in K, defined on an arbitrary set 5S. 

(iv) Set of all polynomials of degree less than n with coefficients in K. 


EXERCISE 3. Show that (1) and (iv) are isomorphic. 


EXERCISE 4. Show that if $ has n elements, (i) and (iii) are isomorphic. 
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EXERCISE 5. Show that when K = R, (iv) is isomorphic with (iii) when S 
consists of n distinct points of R. 


Definition. A subset Yof a linear space X is called a subspace if sums and scalar 
multiples of elements of Y belong to Y. 


Examples of Subspaces. (a) X as in Example (i) Y the set of vectors 
(0,43. .... 05 4,0) whose first and last component is zero. 

(b) X as in Example (11), Y the set of all periodic functions with period x. 

(c) X as in Example (iii), Y the set of constant functions on 5. 


(d) X as in Example (iv), Y the set of all even polynomials. 


Definition. The sum of two subsets Y and Z of a linear space X, denoted as 
Y + Z, is the set of all vectors of form v + z, y in Y, z in Z. 


EXERCISE 6. Prove that Y + Z is a linear subspace of X if Y and Z are. 


Definition. The intersection of two subsets Y and Z of a linear space X, denoted 
as Y N Z, consists of all vectors x that belong to both Y and Z. 


EXERCISE 7. Prove that if Y and Z are linear subspaces of X, so is Y M Z. 


EXERCISE 8. Show that the set {0} consisting of the zero element of a linear 
space X is a subspace of X. It is called the trivial subspace. 


Definition. A linear combination of j vectors xj,...,x; of a linear space is a 
vector of the form 


kixi + d KX), Ky... KK € K. 


EXERCISE 9. Show that the set of all linear combinations of xj,....x; IS à 
subspace of X, and that it is the smallest subspace of X containing x;...., xj. This is 
called the subspace spanned by xi... .. Xj. 


Definition. A set of vectors xj, ...,x,, in X span the whole space X if every x in 
X can be expressed as a linear combination of xj...... bu 


Definition. The vectors x;....,x; are called linearly dependent if there is a 
nontrivial linear relation between them, that is, a relation of the form 


kixi +++ + kx; = 0, 


where not all £j...., k; are zero. 
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Definition. A set of vectors x;,....x; that are not linearly dependent is called 
linearly independent. 


EXERCISE 10. Show that if the vectors x1,....x; are linearly independent, then 
none of the x; is the zero vector. 


Lemma 1. Suppose that the vectors x;,......x, span a linear space X and that the 
vectors v;,..., y; in X are linearly independent. Then 


jon. 
Proof. Since xj,...,x, span X, every vector in X can be written as a linear 
combination of x;,.... x,. In particular, vj: 


yi = kixi tees EQSux,. 


Since v, Æ 0 (see Exercise 10), not all k are equal to 0, say k; Æ 0. Then x; can be 
expressed as a linear combination of y; and the remaining xs. So the set consisting of 
the x's, with x; replaced by yı span X. If j > n, repeat this step  — 1 more times and 
conclude that y;,...,¥, span X: if j > n, this contradicts the linear independence of 
the y's for then y,.; is a linear combination of vj... .. Ym L 


Definition. A finite set of vectors which span X and are linearly independent is 
called a basis for X. 


Lemma 2. A linear space X which is spanned by a finite set of vectors xi, ..., x, 
has a basis. 


Proof. VW x1,..., x, are linearly dependent, there is a nontrivial relation between 
them; from this one of the x; can be expressed as a linear combination of the rest. So 
we can drop that x;. Repeat this step until the remaining x; are linear independent: 
thev still span X, and so they form a basis. E 


Definition. A linear space X is called finite dimensional if it has a basis. 

A finite-dimensional space has many, many bases. When the elements of the 
space are represented as arrays with n components, we give preference to the special 
basis consisting of the vectors that have one component equal to 1, while all the 


others equal 0. 


Theorem 3. All bases for a finite-dimensional linear space X contain the same 
number of vectors. This number is called the dimension of X and is denoted as 


dim X. 
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Proof. Let xj, ..., x, be one basis, and let y,,..., v,, be another. By Lemma | and 
the definition of basis we conclude that m < n, and also n < m. So we conclude that 
n and m are equal. L 


We define the dimension of the trivial space consisting of the single element 0 to 
be zero. 


Theorem 4. Every linearly independent set of vectors yj,..., y; in a finite- 
dimensional linear space X can be completed to a basis of X. 


Proof. If y,,...,¥; do not span X, there is some x; that cannot be expressed as a 
linear combination of y;,.... y;. Adjoin this x; to the y's. Repeat this step until the 
y’s span X. This will happen in less than n steps, n = dim X, because otherwise X 
would contain more than n linearly independent vectors, impossible for a space of 


dimension 7. C] 


Theorem 4 illustrates the many different ways of forming a basis for a linear 
space. 


Theorem 5. (a) Every subspace Y of a finite-dimensional linear space X is 
finite dimensional. 

(b) Every subspace Y has a complement in X, that is, another subspace Z such 
that every vector x in X can be decomposed uniquely as 


xX—y-cz y in Y,z in Z. (11) 
Furthermore 


dim X = dim Y + dimZ. (11) 


Proof. We can construct a basis in Y by starting with any nonzero vector yi, and 
then adding another vector y and another, as long as they are linearly independent. 
According to Lemma 1, there can be no more of these v; than the dimension of X. A 
maximal set of linearly independent vectors y;,...,¥j in Y spans Y, and so forms a 
basis of ¥ According to Theorem 4, this set can be completed to form a basis of X by 
adjoining Z;.;.....Z,. Define Z as the space spanned by Z;.;,....Z,: clearly Y and 
Z are complements, and 


dim X =n = j + (n — j) = dim Y + dimZ. O 
Definition. X is said to be the direct sum of two subspaces Y and Z that are 
complements of each other. More generally X is said to be the direct sum of its 


subspaces Y,,..., Ym if every x in X can be expressed uniquely as 


X= yi tees + Ym, y in Y; (12) 


FUNDAMENTALS 7 
This relation is denoted as 
A = FQ Of. 


EXERCISE 11. Prove that if X is finite dimensional and the direct sum of 
Y;...., Fn, then 


dimX = V dim Y}. (12) 


Definition. An (n — 1)-dimensional subspace of an n-dimensional space is 
called a hyperplane. 


EXERCISE 12. Show that every finite-dimensional space X over K is isomorphic 
to K", n = dim X. Show that this isomorphism is not unique when n is 1. 


Since every n-dimensional linear space over K is isomorphic to K", it follows that 
two linear spaces over the same field and of the same dimension are isomorphic. 

Note: There are many ways of forming such an isomorphism; it is not unique. 

The concept of congruence modulo a subspace. defined below, is a very useful 
tool. 


Definition. For X a linear space, Y a subspace, we say that two vectors x. x» in X 
are congruent modulo Y, denoted 


X; = x; mod F, 
if xj — x» € Y. Congruence mod Y is an equivalence relation, that is, it is 
(i) symmetric: if x, = x2, then x» = x4. 


(ii) reflexive: x = x for all x in X. 
(il) transitive: if x) = x2,x2? = x3. then x, = x3. 


EXERCISE 13. Prove (i)-(ii) above. Show furthermore that if x; = x». then 
kx; = kx for every scalar k. 


We can divide elements of X into congruence classes mod Y. The congruence 
class containing the vector x is the set of all vectors congruent with X; we denote it 
by {x}. 


EXERCISE I4. Show that two congruence classes are either identical or disjoint. 


The set of congruence classes can be made into a linear space by defining addition 
and multiplication by scalars, as follows: 


tx} + {z} = {x + 2} 


8 LINEAR ALGEBRA AND ITS APPLICATIONS 


and 


k{x} = {kx}. 


That is, the sum of the congruence class containing x and the congruence 
class containing z is the class containing x+ z. Similarly for multiplication by 
scalars. 


EXERCISE 15. Show that the above definition of addition and multiplication 
by scalars is independent of the choice of representatives in the congruence 
class. 


The linear space of congruence classes defined above is called the quotient space 
of X mod Y and is denoted as 


X(mod Y) or X/Y. 


The following example is illuminating: Take X to be the linear space of all row 
vectors (da;.....a,) with m components, and take Y to be all vectors 
y = (0,0.03,....a,) whose first two components are zero. Then two vectors are 
congruent mod Y iff their first two components are equal. Each equivalence class can 
be represented by a vector with two components, the common components of all 
vectors in the equivalence class. 

This shows that forming a quotient space amounts to throwing away information 
contained in those components that pertain to Y. This is a very useful simplification 
when we do not need the information contained in the neglected components. 

The next result shows the usefulness of quotient spaces for counting the 
dimension of a subspace. 


Theorem 6. Yis a subspace of a finite-dimensional linear space X; then 


dim Y + dim(X/Y) = dim X. (13) 

Proof. Let vj, ..., y; be a basis for Y, j = dim Y. According to Theorem 4, this set 

can be completed to form a basis for X by adjoining xj.,,...,x,,n = dim X. We 
claim that 

xad lx] (13) 


form a basis for X/Y. To show this we have to verify two properties of the cosets 
(137: 


(i) They span X/Y. 
(ii) They are linearly independent. 
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(i) Since yj. .... x, form a basis for X, every x in X can be expressed as 


x= y aV; + » by Xp. 
It follows that 
{x} = M nix). 


(ii) Suppose that 


>, Cid xg = (J, 


This means that 


, CkXk = Y, y in Y. 


Express y as $` d;v;; we get 


> CEXE — » diy; = 0. 


Since y,,... Xn form a basis, they are linearly independent, and so all the c, and d; 
are zero. 
It follows that 
dim X/Y = # of x, — n — j. 
So 
dim Y + dim X/Y =j+n—-—j=n=dimX. LI 


EXERCISE 16. Denote by X the linear space of all polynomials p(t) of degree 
< n, and denote by Y the set of polynomials that are zero at f),...,4, J « n. 


(i) Show that Y is a subspace of X. 
(ii) Determine dim Y. 
(iii) Determine dim X/Y. 


The following corollary is a consequence of Theorem 6. 


Corollary 6'. A subspace Y of a finite-dimensional linear space X whose 
dimension is the same as the dimension of X is all of X. 
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ExERCISE 17. Prove Corollary 6'. 


Theorem 7. Suppose X is a finite-dimensional linear space, U and V two 
subspaces of X such that X is the sum of U and V: 


X — UV. 
Denote by W the intersection of U and V: 

W-Unv. 
Then 


dim X — dim U 4- dim V — dim W. (14) 


Proof. When the intersection W of U and V is the trivial space {0}, dim W = 0, 
and (14) is relation (11) of Theorem 5. We show now how to use the notion of 
quotient space to reduce the general case to the simple case dim W = 0. 

Define Uy = U/W, Vo = V/W; then Uo N Vo = {0}, and so Xp = X/W satisfies 

Xo = Uo + Vo. 
So according to (11), 
dim Xp = dim Up + dim Vo. (14) 


Applying (13) of Theorem 6 three times, we get 


dim X5 = dim X — dim W, dim Us = dim U — dim W. 
dim V; = dim V — dim W. 


Setting this into relation (14)' gives (14). LJ 


Definition. The Cartesian sum of two linear spaces over the same field is the set 
of pairs 


(x1, X2); X] in X| , A2 in Xa, 


where addition and multiplication by scalars is defined componentwise. The direct 
sum is denoted as 


X1 (D X5. 


It is easy to verify that X; «p Xo is indeed a linear space. 


FUNDAMENTALS 1] 


EXERCISE 18. Show that 
dim X, © X» = dim X, + dim X5. 


EXERCISE 19. Xa linear space, Ya subspace. Show that Y (p X/Y is isomorphic 
to X. 


Note: The most frequently occurring linear spaces in this text are our old friends 
R” and C", the spaces of vectors (d1,...,a,) with n real, respectively complex, 
components. 

So far the only means we have for showing that a linear space X is finite 
dimensional is to find a finite set of vectors that span it. In Chapter 7 we present 
another, powerful criterion for a Euclidean space to be finite dimensional. In Chapter 
14 we extend this criterion to all normed linear spaces. 

We have been talking about sets of vectors being linearly dependent or 
independent, but have given no indication how to decide which is the case. Here is an 
example: 

Decide 1f the four vectors 


| l 2 2 
] -] l —] 
0 | l 2 
| l 3 3 


are linearly dependent or not. That is, are there four numbers ki, k2,k3,k4, not all 
zero, such that 


l | 2 2 Ü 

i | —| l .|-1] [0 s 
ki 0 + k5 + ky | + k4 0 0 
I ] 3 3 0 


This vector equation is equivalent to four scalar equations: 


ki + ko + 2k3 + 2k4 = Ù, 
ki — ko + k; — k4 = 0, 
ko + k; = 0, 

ky + ko + 3k3 + 3ky = 0. 


The study of such systems of linear equations is the subject of Chapters 3 and 4. 
There we describe an algorithm for finding all solutions of such systems of 
equations. 
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EXERCISE 20. Which of the following sets of vectors x = (xj,...,x,) in R” area 
subspace of R"? Explain your answer. 


(a) All x such that x, > 0. 

(b) All x such that x, + x» = 0. 

(c) All x such that x; + x5 + 1 — OQ. 
(d) All x such that x, = 0. 

(e) All x such that x; is an integer. 


EXERCISE 21. Let U, V, and W be subspaces of some finite-dimensional vector 
space X. Is the statement 


dim(U + V + W) = dim U + dim V + dim W — dim(U A V) — dim(U N W) 
— dim(V N W) + dim(UN V N W), 


true or false? If true, prove it. If false, provide a counterexample. 


CHAPTER 2 


Duality 


Readers who are meeting the concept of an abstract linear space for the first time 
may balk at the notion of the dual space as piling an abstraction on top of an 
abstraction. I hope that the results presented at the end of this chapter will convince 
such skeptics that the notion is not only natural but useful for expeditiously deriving 
interesting concrete results. The dual of a normed linear space, presented in Chapter 
14, is a particularly fruitful idea. 

The dual of an infinite-dimensional normed linear space is indispensable for their 
study. 

Let X be a linear space over a field K. A scalar valued function /, 


|: X — K, 
defined on X, is called linear if 
U(x + y) = Kx) + Uy) (1) 
for all x, y in X, and 
I(kx) = kl(x) (iy 


for all x in X and all & in K. Note that these two properties, applied repeatedly, show 
that 


[kyxy +++ + knän) = ky lly) +--+ + kall xq). (1)" 
We define the sum of two functions by pointwise addition; that is, 


(1 + m)(x) = I(x) + m(x). 
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Multiplication of a function by a scalar is defined similarly. It is easy to verify that 
the sum of two linear functions is linear, as is the scalar multiple of one. Thus the set 
of linear functions on a linear space X itself forms a linear space, called the dual of X 
and denoted by X’. 


EXAMPLE I. X = {continuous functions f(s),0 < s < 1}. Then for any point 
s; in [O, 1]. 


Kf) — f(si) 
is a linear function. So is 


H 


Kf) = 5 " kff(s;). 


where s; 1s an arbitrary collection of points in [0, 1], k; arbitrary scalars. So is 
| 
iy = | fs 
Jo 


EXAMPLE 2. X = {Differentiable functions f on [0, 1|}. For s in [0, 1], 
H 
Kf) = » ajo f(s) 
is a linear function, where 3/ denotes the jth derivative. 
Theorem 1. Let X bea linear space of dimension n. The elements x of X can be 
represented as arrays of n scalars: 


a Pee oo (3) 


Addition and multiplication by a scalar is defined componentwise. Let a;,....«, be 
any array of n scalars; the function / be defined by 


I{x) = dct] Tct O5, Cy (4) 


is a linear function of x. Conversely, every linear function / of x can be so 
represented. 


Proof. That /(x) defined by (4) is a linear function of x is obvious. The converse is 
not much harder. Let / be any linear function defined on X. Define x; to be the vector 
whose jth component is |, with all other components zero. Then x defined by (3) can 
be expressed as 


X = CX tee + Cty. 


Denote I(x;) by aj; it follows from formula ( 1)" that / is of form (4). [ | 
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Theorem | shows that if the vectors in X are regarded as arrays of n scalars, then 
the elements / of X' can also be regarded as arrays of scalars. It follows from (4) 
that the sum of two linear functions is represented by the sum of the two arrays 
representing the summands. 

Similarly, multiplication of / by a scalar is accomplished by multiplying each 
component. We deduce from all this the following theorem. 


Theorem 2. The dual X' of a finite-dimensional linear space X is a finite- 
dimensional linear space, and 


dim X' — dim X. 


The right-hand side of (4) depends symmetrically on the two arrays representing 
x and /. Therefore we ought to write the left-hand side also symmetrically, we 
accomplish that by the scalar product notation 


(Lx) dx). (5) 


We call it a product because it is a bilinear function of / and x: for fixed / it is a linear 
function of x, and for fixed x it is a linear function of /. 

Since X' is a linear space, it has its own dual X" consisting of all linear functions 
on X'. For fixed x, (/, x) is such a linear function. By Theorem 1, all linear functions 
are of this form. This proves the following theorem. 


Theorem 3. The bilinear function (/, x) defined in (5) gives a natural 
identification of X with X". 


EXERCISE I. Given a nonzero vector x, in X, show that there is a linear function / 
such that 


l(x1) x O. 


Definition. Let Y be a subspace of X. The set of linear functions / that vanish on 
Y, that is, satisfy 


f(y) =O for all v in Y, (6) 
is called the annihilator of the subspace Y; it is denoted by Y+. 
EXERCISE 2. Verify that Y+ is a subspace of X’. 


Theorem 4. Let Y be a subspace of a finite-dimensional space X, Y+ its 
annihilator. Then 


dim Y^ + dim Y = dim X. (7) 


16 LINEAR ALGEBRA AND ITS APPLICATIONS 
Proof. We shall establish a natural isomorphism between Y+ and the dual (X / YV 


of X/Y. Given in Y^ we define Lin (X/Y) as follows: for any congruence class {x} 
in X/Y, we define 


Lixj = f(x). (8) 


It follows from (6) that this definition of L is unequivocal, that is, does not depend on 
the element x picked to represent the class. 

Conversely, given any L in (X/Y)', (8) defines a linear function / on X that 
satisfies (6). Clearly, the correspondence between / and L is one-to-one and an 
isomorphism. Thus since isomorphic linear spaces have the same dimension, 


dim Y^ = dim(X/Y). 


By Theorem 2, dim(X/Y) — dim X/Y, and by Theorem 6 of Chapter 1, 
dim X/Y — dim X — dim Y, so Theorem 4 follows. LJ 


The dimension of Y^ is called the codimension of Y as a subspace of X. By 
Theorem 4, 


codim Y + dim Y = dim X. 


Since Y^ is a subspace of X', its annihilator, denoted by Y ^^, is a subspace 
of X". 


Theorem 5. Under the identification (5) of X" and X, for every subspace Yof a 
finite-dimensional space X, 


Y+ — y. 

Proof. It follows from definition (6) of the annihilator of Y that all y in Y belong to 
Y-- , the annihilator of Y^. To show that Y is all of Y++, we make use of (7) applied 
to X’ and its subspace Y~: 

dim Y^ + dim Y+ = dim X'. (7y 
Since dim X' = dim X, it follows by comparing (7) and (7) that 


dim ¥++ = dim Y. 


So Y is a subspace of Y^— that has the same dimension as Y —; but then according to 
Corollary 6' in Chapter 1, Y = Y ^*. a 
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The following notion is useful: 


Definition. Let X be a finite-dimensional linear space, and let 5 be a subset of 
X. The annihilator S^ of S is the set of linear functions / that are zero at all vectors 
s of S: 

l(s) 20 for s in S. 
Theorem 6. Denote by Y the smallest subspace containing S: 
S 2 yt. 


EXERCISE 3. Prove Theorem 6. 
According to formalist philosophy, all of mathematics is tautology. Chapter 2 


might strike the reader—as it does the author—as quintessential tautology. Yet even 
this trivial-looking material has some interesting consequences: 


Theorem 7. Let / be an interval on the real axis, t),...,f, n distinct points. 
Then there exist n numbers rj,...,7, such that the quadrature formula, 


| rod = mip(n) + + mtu) (9) 
i 
holds for all polynomials p of degree less than r. 


Proof. Denote by X the space of all polynomials p(t) = ag + a4t + +++ + a, 4407! 
of degree less than n. Since X is isomorphic to the space (do,4],....4,5 1) = 
R", dim X — n. We define /; as the linear function 


l(p) = p(t;) (10) 


The /; are elements of the dual space of X; we claim that they are linearly 
independent. For suppose there 1s a linear relation between them: 


cili t e cul, = 0. (11) 


According to the definition of the /;, (11) means that 


cyp(ti) +--+ cup(t;) = 0 (12) 


for all polynomials p of degree less than zi. Define the polynomial gy as the product 


q(t) = |I (t — tj). 
ith 
Clearly, q4 is of degree n — 1, and is zero at all points 1j, j Æ k. Since the points /; are 
distinct, gg is nonzero at fy. Set p = qg in (12); since q, (fj) = 0 for j Æ k, we obtain 
that cpg, (t,) = 0; since q(t) is not zero, c; must be. This shows that all coefficients 
cx are Zero, that is, that the linear relation (11) is trivial. Thus the /;, j = 1,....n aren 
linearly independent elements of X’. According to Theorem 2, dim X' = dim X = n; 
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therefore the /; form a basis of X'. This means that any other linear function / on X 
can be represented as a linear combination of the /;: 


l = mili +--+ + Maln. 


The integral of p over / is a linear function of p; therefore it can be represented as 
above. This proves that given any n distinct points 7/1... .,£,, there is a formula of 
form (9) that is valid for all polynomials of degree less than n. Q 


EXERCISE 4. In Theorem 6 take the interval / to be |—1, 1|, and take n to be 3. 
Choose the three points to be /; = —a, tə = 0, and ñ; = a. 


(i) Determine the weights m. m,m so that (9) holds for all polynomials of 
degree <3, 


(ii) Show that for a> 4/1/3, all three weights are positive. 
(iii) Show that for a = 4/3/5, (9) holds for all polynomials of degree <6. 


EXERCISE 5. In Theorem 6 take the interval / to be [—1,1], and take n = 4. 
Choose the four points to be —a, —b, b. a. 


(i) Determine the weights 7j.mmo2.ms. and ma so that (9) holds for all 
polynomials of degree «4. 


(ii) For what values of a and b are the weights positive? 
EXERCISE 6. Let P; be the linear space of all polynomials 
p(x) = ag + ayx + ax? 
with real coefficients and degree < 2. Let £, £5, $4 be three distinct real numbers, and 


then define 
en aernne E; = p(£j) for j= 2 


(a) Show that fı, £2, £3 are linearly independent linear functions on P>. 
(b) Show that £1, £5, £3 is a basis for the dual space P3. 


(c) (1) Suppose {e),....e,} is a basis for the vector space V. Show there exist 
linear functions {f,,...4,} in the dual space V' defined by 
| ifi-j 
f(e; = 
ile) 0 ifizj. 
Show that {@),...,4,} is a basis of V', called the dual basis. 


(2) Find the polynomials pix). po(x). pa(x) in Pz for which £4. és, & is the 
dual basis in P3. 


EXERCISE 7. Let W be the subspace of R? spanned by (1,0.—1,2) and (2, 3, 
l. 1). 
Which linear functions £(x) = cix1 + C2¥2 + caxs + c4x4 are in the annihilator of W? 


CHAPTER 3 


Linear Mappings 


Chapter 3 abstracts the concept of a matrix as a linear mapping of one linear space 
into another. Again I point out that no greater generality is achieved, so what has 
been gained? 

First of all, simplicity of notation; we can refer to mappings by single symbols, 
instead of rectangular arrays of numbers. The abstract view leads to simple, 
transparent proofs. This is strikingly illustrated by the proof of the associative law of 
matrix multiplication and by the proof of the basic result that the column rank of a 
matrix equals its row rank. 

Many important mappings are not presented in matrix form; see, for example, the 
first two applications presented in this chapter. 

Last but not least, the abstract view is indispensable for infinite-dimensional 
spaces. There the view of mappings as infinite matrices has held up progress until it 
was replaced by an abstract concept. 

A mapping from one set X into another set U is a function whose arguments are 
points of X and whose values are points of U: 


JS 
In this chapter we discuss a class of very special mappings: 


(i) Both X, called the domain space, and U, called the target space, are linear 
spaces over the same field, 


(ii) A mapping T: X — U is called linear if it is additive, that is, satisfies 
T(x + y) = T(x) + T(y) 
for all x, y in X, and if it is homogeneous, that is, satisfies 


T(kx) = KT(x) 
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for all x in X and all k in K. The value of T at x is written multiplicatively as 
Tx: the additive property becomes the distributive law: T(x + v) = Tx + Ty. 


Other names for linear mapping are linear transformation and linear operator. 
Example 1. Any isomorphism. 

Example 2. X = U polynomials of degree less than n in s; T = d/ds. 
Example 3. X = U = &, T rotation around the origin by angle 8. 


Example 4. X any linear space, U the one dimensional space K, T any linear 
function on X. 


Example 5. X = U = Differentiable functions, T linear differential operator. 
Example 6. X = U = Co(R). (Tf)(x) = f fo)(x — yy dy 
-i 


Example 7. X = R", U = R", u = Tx defined by 


nm 
uj = X LijXj, i= 1,...,m. 


Here u = (u,...,tgy) X = (X... X8). 


Theorem 1. (a) The image of a subspace of X under a linear map T is a 
subspace of U. 

(b) The inverse image of a subspace of U, that is the set of all vectors in X 
mapped by T into the subspace, 1s a subspace of X. 


EXERCISE I. Prove Theorem 1. 


Definition. The range of T is the image of X under T; it is denoted as Rr. By 
part (a) of Theorem 1, it is a subspace of U. 


Definition. The nullspace of T is the set X mapped into 0 by T: Tx = 0; it is 
denoted as Ny. By part (b) of Theorem 1, it is a subspace of X. 


The following result is a workhorse of the subject, a fundamental result about 
linear maps. 


Theorem 2. Let T: X — U be a linear map; then 


dim Ny + dim RT = dim X. 
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Proof. Since T maps Nr into 0, Tx; = Tx» when x, and x» are equivalent mod Nr. 
So we can define T acting on the quotient space X /Nr by setting 


Tix]. = Tx. 


T is an isomorphism between X /N4 and Ry; since isomorphic spaces have the same 
dimension, 


dim X/Nr = dim Rr. 


According to Theorem 6 of Chapter 1, dim X/N = dim X — dim N; combined with 
the relation above we get Theorem 2. C 


Corollaries. A Suppose dim U « dim X; then 
Tx —0 for some x # 0. 


B Suppose dim U = dim X and the only vector satisfying Tx = 0 is x = 0. Then 
RT = U. 


Proof. A dim Ry < dim U < dim X; it follows therefore from Theorem 2 that 
dim Nr > 0, that is, that Nr contains some vector not equal to 0. 

B By hypothesis, Nr = {0}, so dim Nr = 0. It follows then from Theorem 2 and 
from the assumption in B that 


dim Ry = dim X = dim U. 


By Corollary 6' of Chapter |, a subspace whose dimension is the dimension of the 
whole space is the whole space; therefore Rr = U. [] 


Theorem 2 and its corollaries have many applications, possibly more 
than any other theorem of mathematics. It is useful to have concrete versions of 
them. 

Corollary A’. X = R", U = R", m < n. Let T be any mapping of R” — R” as 


in Example 7; since m = dim U < dim X = n, by Corollary A, the system of linear 
equations 


> typ =0, i=1,...,m (1) 


has a nontrivial solution, that is one where at least one x; x- 0. 
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Corollary B'. X = R", U = R", T given by 


fT 
` fjjX; = uj, d NES 3 (2) 
l 


If the homogeneous system of equations 


H 
2 ux = 0, i —1,...,n (3) 

l 
has only the trivial solution x, = --- = x, = 0, then the inhomogeneous system (2) 
has a solution for all uj... .,u,. Since the homogeneous system (3) has only the 


trivial solution, the solution of (2) is uniquely determined. 
Application I. Take X equal to the space of all polynomials p(s) with complex 


coefficients of degree less than n, and take U = C". We choose s;,...,5, as n 
distinct complex numbers, and define the linear mapping T: X — U by 


Tp zm (psi ), eee P(s4)). 


We claim that Nr is trivial: for Tp = 0 means that p(s;) = 0,..., p(s,) = 0, that is, 


that p has zeros at s;....,5,. But a polynomial p of degree less than n cannot have n 
distinct zeros, unless p = 0. Then by Corollary B, the range of T is all of U; that is, 
the values of p at 5;,....5, can be prescribed arbitrarily. 


Application 2. X is the space of polynomials with real coefficients of degree 
« n, U = R”. We choose n pairwise disjoint intervals $;,.... S, on the real axis. We 
define p; to be the average value of p over 5;: 


l 
Pj 1 f p(s)ds, |$;| = length of $. (4) 
ISjl Js, 


We define the linear mapping T: X — U by 
Tp ed picccipa): 


We claim that the nullspace of T is trivial: for, if p; = 0, p changes sign in 5; and so 
vanishes somewhere in 5;. Since the S; are pairwise disjoint, p would have n distinct 
zeros, too many for a polynomial of degree less than n. Then by Corollary B the 
range of T is all of U; that means that the average values of p over the intervals 
S,,...,5, can be prescribed arbitrarily. 
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Application 3. In constructing numerical approximations to solutions of the 
Laplace equation in a bounded domain G of the plane, 


Au = uu + tiy =0 inG, (5) 


with x prescribed on the boundary of G, one fills G, approximately, with a lattice and 
replaces the second partial derivatives with centered differences: 


Mw — 2us + ug 


Xi — ^7 
h- 
: (6) 
HN — ZH T Us 
by = —7 
he 
where 
N 
W 5 E 
S 
and %4 is the mesh spacing. Setting (6) into (5) gives the following relations: 
Hw + UN + UE + Us 
ces W N i E S (T) 


This equation relates the value no of u at each lattice point O in the domain G to the 
values of u at the four lattice neighbors of u. In case any lattice neighbor of O lies 
outside G, we set the value of u there equal to the boundary value of u at the nearest 
boundary point. The resulting set of equations (7) is a system of n equations for n 
unknowns of the form (2); n is equal to the number of lattice points in G. 

We claim that the corresponding homogeneous equations have only the trivial 
solution 4, = O for all lattice points. The homogeneous equations correspond to 
taking the boundary values to be zero. Now take any solution of the homogeneous 
equations and denote by i44; the maximal value of uo over all lattice points in G. 
That maximum is assumed at some point O of G; it follows from (7) that then 
H = Hmax at all four lattice neighbors of O. Repeating this argument we eventually 
reach a lattice neighbor which falls outside G. Since u was set to zero at all such 
points, we conclude that max = 0. Similarly we show that tmin = 0; together these 
imply that uj = 0 for all lattice points for a solution of the homogeneous equation. 
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By Corollary B', the system of equations (7), with arbitrary boundary data, has a 
unique solution. 


EXERCISE 2. Let 


"H 
, ljXj = Ui, i= 1,...,m 
l 


be an overdetermined system of linear equations—that is, the number mm of equations 
is greater than the number 5 of unknowns xj, .... X». Take the case that in spite of the 
overdeterminacy, this system of equations has a solution, and assume that this 
solution is unique. Show that it is possible to select a subset of of these equations 
which uniquely determine the solution. 


We turn now to the rudiments of the algebra of linear mappings, that is, their 


addition and multiplication. Suppose that T and S are both linear maps of X — U; 
then we define their sum T + 5 by setting for each vector x in X, 


(T+ S)(x) = Tx 4- Sx. 


Clearly, under this definition T + S is again a linear map of X — U. We define KT 
similarly, and we get another linear map. 

It is not hard to show that under the above definition the set of linear mappings of 
X — U themselves forms a linear space. This space is denoted by L(X, U). 

Let T, S be maps, not necessarily linear, of X into U, and U into V, respectively, X, 
U, Varbitrary sets. Then we can define the composition of T with S, a mapping of X 
into Vobtained by letting T act first, followed by S, schematically 


pu HT E 


The composite is denoted by S ^ T: 


Note that composition is associative: if R maps V into Z, then 


Re (So T) = (ReS)eT. 


Theorem 3. (i) The composite of linear mappings is also a linear mapping. 
(ii) Composition is distributive with respect to the addition of linear maps, that is, 


(R+S)oT=ReT+SoT 
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and 
So({T+P)=SeT+SoP. 
where R and S map U — V and P and T map X — U. 
EXERCISE 3. Prove Theorem 3. 


On account of this distributive property, coupled with the associative law that 
holds generally, composition of linear maps is denoted as multiplication: 


So T = ST. 


We warn the reader that this kind of multiplication is generally not commutative; for 
example, TS may not even be defined when ST is, much less equal to it. 


Example 8. X — U — V — polynomials in s, T — d/ds, S — multiplication 
by s. 


Example 9. X=U=V=R', 


S: rotation around x, axis T: rotation around x^ axis 
by 90 degrees by 90 degrees 


EXERCISE 4. Show that S and T in Examples 8 and 9 are linear and that 
ST Æ TS. 


Definition. A linear map is called invertible if it is 1-to-1 and onto, that is, if it is 
an isomorphism. The inverse is denoted as T~!. 


EXERCISE 5. Show that if T is invertible, T T^! is the identity. 


Theorem 4. (i) The inverse of an invertible linear map is linear. 
(ii) If S and T are both invertible, and if ST is defined, then ST also is invertible, 
and 


(ST)! 2T'!S !. 


EXERCISE 6. Prove Theorem 4. 


Let T be a linear map X — U, and / a linear function, that is, / is an element of U”. 
Then the product (1.e., composite) /T is a linear mapping of X into K, that 1s, an 
element of X’; denote this element by m: 


m(x) = I(Tx). (8) 
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This defines an assignment of an element m of X' to every element / of U”. It is easy 
to deduce from (8) that this assignment is a linear mapping U’ — X"; it is called the 
transpose of T and is denoted by T’. 

Using the notation (6) in Chapter 2 to denote the value of a linear function, we can 
rewrite (8) as 


Um, x) = (I, Tx). (8) 

Using the notation m = T'I, this can be written as 
(T 1, x) = (L, Tx). (9) 

EXERCISE 7. Show that whenever meaningful, 


(ST) = T'S', (TR) =T cR and (T^!)-(T)'.. 


Example 10. X = R", U = R^, T as in Example 7. 


i= So fij Xj. (10) 


U' is then also R", X' = R", with (Z, u) = S07 hu; (m, x) = 35) mjxj. Then with 
u = Tx, using (10) we have 


(l, u) = 3 liu; = >: Slit; 
] i J 
j i 


where m = TI, with 


m; = 3 lity- (| 1) 
l 


EXERCISE 8. Show that if X" is identified with X and U^" with U via (5) in 
Chapter 2, then 


T =T. 


We shall show in Chapter 4 that if a mapping T is interpreted as a matrix, its 
transpose T” is obtained by making the columns of T the rows of T”. 
We recall from Chapter 2 the notion of the annihilator of a subspace. 
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Theorem 5. The annihilator of the range of T is the nullspace of its transpose: 


Ry = Ny. (12) 


Proof. By the definition in Chapter 2 of annihilator, the annihilator of the range 
Ry consists of those linear functions / defined on the target space U for which 


(lu) =0 for all x in Rr. 
Since u in Ry consists of u = Tx, x in X, we can rewrite the above as 
(1,Tx) =0 for all x. 
Using (9), we can rewrite this as 
(T'1, x) 2 0 for all x. 
It follows that / is in Ry iff T'/ = 0; this proves Theorem 5. LJ 
Now take the annihilator of both sides of (12). According to Theorem 5 of 
Chapter 2, the annihilator of R+ is R itself. In this way we obtain the following 
theorem. 
Theorem 5’. The range of T is the annihilator of the nullspace of T". 


Ry = Nj. (12) 


(12)' is a very useful characterization of the range of a mapping. Next we give 
another consequence of Theorem 5, 


Theorem 6. 
dim Rr = dim Ry. (13) 
Proof. We apply Theorem 4 of Chapter 2 to U and its subspace Rr: 
dim Rt + dim Ry = dim U. 
Next we use Theorem 2 of this chapter applied to T": U' — X*: 
dim Np + dim Ry = dim V”. 
According to Theorem 2, Chapter 2, dim U = dim U’; according to Theorem 5 of 


this chapter, Ri: = Ny, and so dim R4 = dim Ny. So we deduce (13) from the last 
two equations. LJ 
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The following is an easy consequence of Theorem 6. 


Theorem 6'. Let T be a linear mapping of X into U, and assume that X and U 
have the same dimension. Then 


dim Ny; = dim Ny. (13) 
Proof. According to Theorem 2, applied to both T and T', 


dim Nr = dim X — dim Ry, 
dim Ny = dim U' — dim Ry. 


Since dim U = dim U” is assumed to be the same as dim X, (13)' follows from the 
above relations and (13). L] 


Theorem 6 is an abstract version of the classical result that the column rank and 
row rank of a matrix are equal. The usual proofs of this result are abstruse and 
unclear, 

We turn now to linear mappings of a linear space X into itself. The aggregate of 
such mappings is denoted as “(X,X); they are a particularly important and 
interesting class of maps. Any two such maps can be added and multiplied, that is, 
composed, and can be multiplied by a scalar. Thus .Z(X,X) is an algebra. We 
investigate now briefly some of the algebraic aspects of (X, X). 

First we remark that (X, X) is an associative, but not commutative algebra, with 
a unit: the role of the unit is played by the identity map I, defined by Ix = x. The zero 
map O is defined by Ox = 0. (X, X) contains divisors of zero, that is, pairs of 
mappings S and T whose product ST is 0, but neither of which is 0. To see this, 
choose T to be any nonzero mapping with a nontrivial nullspace Ny, and S to be any 
nonzero mapping whose range Rs is contained in Nr. Clearly, TS = 0. 

There are mappings D # 0 whose square D^ is zero. As an example, take X to be 
the linear space of polynomials of degree less than 2. Differentiation D maps this 
space into itself. Since the second derivative of every polynomial of degree less than 
2 is zero, D? = 0, but clearly D 0. 


EXERCISE 9. Show that if A in (X, X) is a left inverse of B in (X, X), that is 
AB = I, then it is also a right universe: BA = I. 


We have seen in Theorem 4 that the product of invertible elements is invertible. 
Therefore the set of invertible elements of ¥&{X,X) forms a group under 
multiplication. This group depends only on the dimension of X, and the field K of 
scalars. It is denoted as GL(n, K), n = dim X. 

Given an invertible element S of (X, X), we assign to each M in (X, X) the 
element Ms constructed as follows: 


Ms = SMS '*. (14) 
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This assignment M — Msg is called a similarity transformation; M is said to be 
similar to Ms. 


Theorem 7. (a) Every similarity transformation is an automorphism of L(X,X), 
maps sums into sums, products into products, scalar multiples into scalar 
multiples: 


(kM). = KMs. (15) 
(M + K); = Ms + Ks. (15) 
(MK), = MgKs. (15)" 


(b) The similarity transformations form a group. 
(Ms); = Mrs. (16) 
Proof. (15) and (15) are obvious; to verify (15)" we use the definition (14): 
MsKs = SMS ^! SKS"! = SMKS ! = (MK)<, 


where we made use of the associative law. 
The verification of (16) is analogous; by (14), 


(Ms), = T(SMS"")T~* = TSM(TS) ! = Mrs: 
here we made use of the associative law, and that (TS)! —-S T-. L 
Theorem 8. Similarity is an equivalence relation; that is, it is: 


(D Reflexive. M is similar to itself. 
(ii) Symmetric. If M is similar to K, then K is similar to M. 
(iii) Transitive. If M is similar to K, and K is similar to L, then M is similar 
to L. 


Proof. (i) is true because we can in the definition (14) choose S = I. 
(ii) M similar to K means that 


K = SMS". (14) 
Multiply both sides by S on the right and S~! on the left, and we see that K is similar 


to M. 
(iii) If K is similar to L, then 


L = TKT^!, (14)" 
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where T is some invertible mapping. Multiply both sides of (14)' by T^! on the right 
and by on the left; we get 


TKT^! = TSMS^"!T^!. 
According to (14)", the left-hand side is L. The right-hand side can be written as 
(TS) M(TS) '!, 
which is similar to M. [| 
EXERCISE IO, Show that if M 1s invertible, and similar to K, then K also is 
invertible, and K`! is similar to M !. 
Multiplication in (X, X) is not commutative, that is, AB is in general not equal 


to BA. Yet they are not totally unrelated. 


Theorem 9. If either A or B in Z(X.X) is invertible, then AB and BA are 
similar. 


EXERCISE II. Prove Theorem 9. 


Given any element A of ZZ(X, X) we can, by addition and multiplication, form all 
polynomials in A: 


ay AP" + ay | AP +---+ ag; (17) 
we can write (17) as p( A), where 
N f 
p(s) = ans" +*+ ap. (17) 
The set of all polynomials in A forms a subalgebra of (X, X); this subalgebra is 
commutative. Such commutative subalgebras play a big role in spectral theory, 
discussed in Chapters 6 and 8. 


An important class of mappings of a linear space X into itself are projections. 


Definition, A linear mapping P: X — X is called a projection if it satisfies 


Example 11. X is the space of vectors x = (a),d2,..-.,a,), P defined as 


Px = (0,0.a35....0,). 


That is, the action of P is to set the first two components of x equal to zero. 
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EXERCISE 12. Show that P defined above is a linear map, and that it is a 
projection. 


Example 12. Let X be the space of continuous functions fin the interval |— 1, 1]; 
define Pf to be the even part of f. that is, 


ry) LAS) 


EXERCISE 13. Prove that P defined above is linear, and that it is a projection. 


Definition. The commutator of two mappings A and B of X into X is AB-BA. 
Two mappings of X into X commute if their commutator 1s zero. 


Remark. We can prove Corollary A’ directly by induction on the number of 
equations m, using one of the equations to express one of the unknowns x; in terms of 
the others. By substituting this expression for x; into the remaining equations, we 
have reduced the number of equations and the number of unknowns by one. 

The practical execution of such a scheme has pitfalls when the number of 
equations and unknowns is large. One has to pick intelligently the unknown to be 
eliminated and the equation that is used to eliminate it, We shall take up these 
matters in the next chapter. 


Definition. The rank of a linear mapping is the dimension of its range. 


EXERCISE I4. Suppose T is a linear map of rank | of a finite dimensional vector 
space into itself. 
(a) Show there exists a unique number c such that T^ — cT. 
(b) Show that if c = 1 then I-T has an inverse. (As usual I denotes the identity 
map Ix — x.) 


EXERCISE 15. Suppose T and S are linear maps of a finite dimensional vector 
space into itself. Show that the rank of ST is less than or equal the rank of S. Show 
that the dimension of the nullspace of ST is less than or equal the sum of the 
dimensions of the nullspaces of S and of T. 


CHAPTER 4 


Matrices 


In Example 7 of Chapter 3 we defined a class of mappings T: R” — IR” where the 
ith component of « = Tx is expressed in terms of the components x; of x by the 
formula 


n 
ui = X tijXj, El... (1) 
i 


and the /;; are arbitrary scalars. These mappings are linear; conversely, we have the 
following theorem. 


Theorem 1. Every linear map Tx = u from R” to R” can be written in form (1). 


Proof. The vector x can be expressed as a linear combination of the unit vectors 
"2 iaces e, where e; has jth component 1, all others 0: 


x= S xej. (2) 
Since T is linear 
gc >. x;Tej. (3) 
Denote the ith component of Te; by tij? 


[jj — (Tej);. (4) 
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It follows from (3) and (4) that the ith component u; of u is 

u; = xti. 
exactly as in formula (1). [ ] 


It is convenient and traditional to arrange the coefficients t; appearing in (1) in a 
rectangular array, 


f Di o... fin 
E» (5) 
Inl wee lmn 


Such an array is called an m by n (m x n) matrix, m being the number of rows, 
n the number of columns. A matrix that has the same number of rows and 
columns is called a square matrix. The numbers f; are called the entries of the 
matrix T. 

According to Theorem 1, there is a l-to-] correspondence between m x n 
matrices and linear mappings T: R” — R”. We shall denote the (ij)th entry 1; of the 
matrix identified with T by 


Tj = (T); (5) 


uU 


A matrix T can be thought of as a row of column vectors, or a column of row 
vectors: 


T= (ci, . ME = ' , i= . * Fi = (1i, . . . lin). (6) 


Fm j mj 


According to (4), the ith component of Te; is rjj; according to (6), the ith component 
of c; is tjj. Thus 


Te; = Cj. (7) 
This formula shows that, as consequence of the decision to put t; in the ith row and 
jth column, the image of e; under T appears as a column vector. To be consistent, we 


shall write all vectors in U = X" as column vectors: 


H1 


Hin 
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We shall also write elements of X = R" as column vectors: 


A H 


The matrix representation (6) of a linear map / from R" to R is a single row vector 
of n components: 


f= sess bul l(x) = lxi Tcr. (8) 


We define by (8) the product of a row vector r with a column vector x, in this order. It 
can be used to give a compact description of formula (1) giving the action of a matrix 
on a column vector: 


Tx-| : |, (9) 


Fg 


where rj,....7,, are the rows of the matrix T. 
In Chapter 3 we have described the algebra of linear mappings. Since matrices 
represent linear mappings of R” into E", there is a corresponding algebra of matrices. 
Let S and T be m x n matrices, representing mappings of R” to R”. Their sum 
T +S represents the sum of these mappings. It follows from formula (4) that the 
entries of T +S are the sums of the corresponding entries of T and 5: 


Next we show how to use (8) and (9) to calculate the elements of the product of 
two matrices. Let T, 5 be matrices 


T: RO R". S. R" Re 


Since the target space of T is the domain space of S, the product ST is well-defined. 
According to formula (7) applied to ST, the jth column of ST is 


STe;. 
According to (7), Te; = cj; applying (9) to x = Te;, and S in place of T gives 


51C; 
STe; = SC} = f f 


where s} denotes the kth row of S. Thus we deduce this rule: 
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Rule of matrix multiplication. Let T be an m x n matrix and S an / x m matrix. 


Then the product of ST is an / x n matrix whose (Kj)th entry is the product of the kth 
row of S and the jth column of T: 


(ST), = sc. (10) 


where s, is the Ath row of S and c; is the jth column of T. 
In terms of entries, 


(ST), — 3 Sui ly. (10)' 


3 I 2\/5 6\ [19 22 
met (3 AE 9- 2) 
: | [3 4 
Example 2. (3 )e d= 4i 
Example 3. (3 4) B = (11). 

.(3 4 
Example 4. (1 »( )= 03 16). 


3 4 
» 5. ; 
Example ( s 6 


3 
Example 6. (1 »( 


; > 6 | 2 23 34 
Example 7. E iit a= (3 a 


Examples | and 7 show that matrix multiplication of square matrices need not be 
commutative. Example 6 is an illustration of the associative property of matrix 
multiplication. 


EXERCISE 1. Let A be an arbitrary m x n matrix, and let D be an m x n 
diagonal matrix, 


n d; If i=}, 
5, - [5 if ij. 
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Show that the ith row of DA equals d; times the ith row of A, and show that the jth 
column of AD equals d; times the jth column of A. 

An n x n matrix A represents a mapping of R” into X". If this mapping is 
invertible, the matrix A is called invertible. 


Remark. Since the composition of linear mappings is associative, matrix 
multiplication, which is the composition of mappings from R” to R” with mappings 
from R” to R', also is associative. 

We shall identify the dual of the space X" of all column vectors with 7 
components as the space (R")’ of all row vectors with n components. 

The action of a vector / in the dual space (R")’ on a vector x of R", denoted 
by brackets in formula (6) of Chapter 2, shall be taken to be the matrix 
product (8): 


(Lx) = lxi o n xs. (11) 
Let x, T and / be linear mappings as follows: 
i: R" — R, T: R" — R", x R — R". 
According to the associative law, 
(IT)x = I( Tx). (12) 


We identify / with an element of (&")', and /T with an element of (R")'. Using the 
notation (11) we can rewrite (12) as 


(IT, x) = (L, Tx). (13) 


We recall now the definition of the transpose T' of T, defined by formula (9) of 
Chapter 3, 


(T'I, x) = (Tx). (13)' 


Comparing (13) and (13)' we see that the matrix T acting from the right on row 
vectors is the transpose of the matrix T acting from the left on column vectors. 

To represent the transpose T' as a matrix acting on column vectors, we change its 
rows into columns, its columns into rows, and denote the resulting matrix as T“; 


T H 
Given a row vector r = (rj,....7,). we denote by r’ the column vector with the 


same components. Similarly, given a column vector c, c^ denotes the row vector with 
the same components. 
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Next we turn to expressing the range of T in matrix language. Setting (7), 
c; = Tl;, into (3), Tx = > xTe;, gives 


u = Tx =xyc) +++ + xc. 
This gives the following theorem. 


Theorem 2. The range of T consists of all linear combinations of the columns 
of the matrix T. 


The dimension of this space is called in old-fashioned texts the column rank of T. 


The row rank is defined similarly; (13)" shows that the row rank of T is the 
dimension of the range of T’. Since according to Theorem 6 of Chapter 3. 


dim Rr = dim A4r, 
we conclude that the column rank and row rank of a matrix are equal. 


EXERCISE 2. Look up in any text the proof that the row rank of a matrix equals 
its column rank, and compare it to the proof given in the present text. 


We show now how to represent a linear mapping T: X — U by a matrix. We have 
seen in Chapter | that X is isomorphic to E^, n = dim X, and U isomorphic to R^, 
m = dim U. The isomorphisms are accomplished by choosing a basis in X, 


Yr Y» and then mapping y; = ej, j — l,...,m 

B: X— R’; (14) 
similarly, 

C: U— R". (14)' 


Clearly, there are as many isomorphisms as there are bases. We can use any of these 
isomorphisms to represent T as R” — R”, obtaining a matrix representation M: 


CTB™' = M. (15) 


When T is a mapping of a space X into itself, we use the same isomorphism in 
(14) and (14)’, that is, we take B = C. So in this case the matrix representing T has 
the form 


BTB!-M. (15) 
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Suppose we change the isomorphism B. How does the matrix representing T 
change? If C is another isomorphism X — R”, the new matrix N representing T is 
= " " ` è i 
N = CTC^!. We can write, using the associative rule and (15)’, 


N = CTC"! = CB`'BTB 'BCT! = SMS", (16) 


where S = CB™!. Since B and C both map X into R", CB ^! = S maps R” onto R^, 
that is, S is an invertible n x n matrix. 

Two square matrices N and M related to each other as in (16) are called 
similar. Our analysis shows that similar matrices describe the same mapping of a 
space into itself, in different bases. Therefore we expect similar matrices to have 
the same intrinsic properties; we shall make the meaning of this more precise in 
Chapter 6. 

We can write any n x n matrix A in 2 x 2 block form: 


kz[^0u Amy 
A2 Ar 
where A;, is the submatrix of A contained in the first k rows and columns, A4» the 


submatrix contained in the first k rows and the last n — k columns, and so on. 


EXERCISE 3. Show that the product of two matrices in 2 x 2 block form can be 
evaluated as 


Àj An Bi Bi  f AnBiy + Ai2Boa1 Aj By + Ai2B2 
Ax, An Ba Boo / \ AoByy; + AzB; AyBpdAsB»J/ 


The inversion of matrices will be discussed from a theoretical point of view in 
Chapter 5, and from a numerical point of view in Chapter 17. 
A matrix that is not invertible is called singular. 


Definition. The square matrix 1 whose elements are Jy = 0 when i is Æj, 
l; = 1 is called the unir matrix. 


Definition. A square matrix (ty) for which ty = 0 for i >j is called upper 
triangular. Lower triangular is defined similarly. 


Definition. A square matrix (1;) for which t; = 0 when |i — j| > 1 is called 
tridiagonal. 


EXERCISE 4. Construct two 2 x 2 matrices A and B such that AB = 0 but 
BA + 0. 
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We turn now to the most important, certainly the oldest, way to solve sets of linear 
equations, Gaussian elimination. We illustrate it on a simple example of four linear 
equations for four unknowns x), X», X3. and x4: 


Xi + 2x2 + 3x4 — x4 = — 2, 
2X1 + 5x2 + 4x3 — 3x4 = I, 
2x; + 3x5 + 4x3 +44 = I, 
x, +44) + 2x3 — 2x4 — 3. 


(17) 


We solve this system of equations by eliminating the unknowns one by one; here is 
how it is done. We use the first equation in (17) to eliminate x; from the rest of the 
equations. To accomplish this, subtract two times the first equation from the second 
and the third equations, obtaining 
X5 — 2x3 — X4 = 5, (18), 

and 

—X5 — 2x3 + 3x4 = 5. (18), 
Subtract the first equation from the fourth one, obtaining 


2X5 — A3 — X4 = 3. (18), 


We use the same technique to eliminate x» from the set of three equations (18). 
We obtain 


—4xs + 2x, = 10, (19), 
3x3 + X4 = —5. (19), 


Finally we eliminate x; from equations (19) by adding 3/4 times (19), to (19),; we 
get 


which yields 
X4 — l. (20), 


We proceed in the reverse order, by backward substitution, to determine the other 
unknowns. Setting the value of x4 from (20), into equation (19), gives 


—4x1 + 2 = 10, 
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which yields 


ime =), (20). 


A 


We could have used equation (19), and would have gotten the same answer. 
We determine x» from any of the equations (18), say (18), by using (20), and 
(20), for x1 and x4. We get 


x»-4—1—5. 
50 
X3 = e (20), 


Finally we determine x; from, say, the first equation (17), using the previously 
determined values of x4, x3, and x»: 


X] = li (20), 


EXERCISE 5. Show that x), x». xs, and x4 given by (20); satisfy all four equations 
(20). 


Notice that the order in which we eliminate the unknowns, along with the 
equations which we use to eliminate them, is arbitrary. We shall return to these 
points. 

A system of n equations 


» yx; = Uy, DIL: (21) 
| 


for n unknowns x;,...,x, may have a unique solution, may have no solution, or may 
have many solutions. We show now how to use Gaussian elimination to determine all 
solutions, or conclude that no solution exists. Here is an example that illustrates the 
last two possibilities. 


Xi +X + 2x3 + 3x4 = uy, 
X, + 2x2 + 3x3 + X4 = itj, 
2x| + X2 + 2x3 4+ 3x4 = Us, 


3x; + 4x0 + 6x4 + 2x4 = ut. 


MATRICES 41 


We eliminate x; from the last three equations by subtracting from them an 
appropriate multiple of the first equation: 


Xə + X3 — 2x4 = uo — ug 
—X5 — 2x4 — 3x4 = u3 — 2u 


Xə — 7x4 = ug — 314 
We use the first equation above to eliminate x5 from the last two: 


—Xs — Sx4 = u3 d- u5 — 3l 


—X3 — 5X4 = Mg — Wo — 2i 


We eliminate x3 by subtracting the last two equations from each other. We find that 
thereby we have eliminated x4, as well, and we get 


0 = u4 — us — 2u» 4+ uj. (23) 


This is the necessary and sufficient condition for the system of equations (22) to have 
a solution. 


EXERCISE 6. Choose values of n1, it», 43, u4 SO that condition (23) is satisfied, 
and determine all solutions of equations (22). 


Equation (22) can be written in matrix notation as 


Mx = u, (22) 


where x and u are column vectors with components xi, X2, X3, X4 and uj, U2, ua, Us, 
and 


L 423 
] 2 3 d 
We Y 33 
3462 


EXERCISE 7. Verify that / = (1. —2, —1, 1) is a left nullvector of M: 


IM = 0. 


Multiply equation (22)' on the left by /; using the result of Exercise 7, we get that 


IMx = lu = Q, 


. " ts "T" I . 
a rederivation of (23) as a necessary condition for (22) to have a solution. 
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EXERCISE 8. Show by Gaussian elimination that the only left nullvectors of M 
are multiples of / in Exercise 7, and then use Theorem 5 of Chapter 3 to show that 
condition (23) is sufficient for the solvability of the system (22). 


Next we show how to use Gaussian elimination to prove Corollary A’ in 
Chapter 3: 
A system of homogeneous linear equations 


Hn 
> iy =0, i=i,...,m, (24) 
j=l 


with fewer equations than unknowns, m < n, has a nontrivial solution—that 1s, one 
where at least one of the x; is nonzero, 


Proof. We use one of the equations (24) to express x; as a linear function of the 
rest of the x's: 


xi = lh(x2,....24). (25) 


We replace x; by /, in the remaining equations, and we use one of them to express x» 
as a linear function of the remaining x's: 


X2 — boost) (25), 
We proceed in this fashion until we reach xm: 
Xm = m(Xm+ Ig xs ;Xn)- (25) mn 


Since there were only / equations and m < n, there are no more equations left to be 
satisfied. So we choose the values of x4,4..... x4 arbitrarily, and we use equations 
(25) (25) -1 -+ (25), in this order, to determine the values of Xm, X41... . 5X1. 

This procedure may break down at the 7th step if none of the remaining equations 
contain x;. In this case we set x;, ;,.... x, equal to zero, assign an arbitrary value to 
Xi, and determine x;.;j,...,.x; from equations (25), ,,...,(25),, in this 
order. LJ 


We conclude this chapter with some observations on how Gaussian elimination 
works for determined systems of n inhomogeneous equations 


n 
) tx; = uj, i= l,...,2 (26) 
j=! 
for n unknowns xj. ...,x,. In its basic form the first equation is used to eliminate x), 


that 1s, express it as 


xy =v + h(t... Xn). (27) 
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Then x, is replaced in the remaining equations by v, + /;. The first of these equations 
is used to express x» as 


Xx) = v2 + h(x3,..., Xn). (27), 
We proceed in this fashion until after (n — 1) steps we find the value of x,. Then we 
determine the values of x,-.;.....x;, in this order, from the relations 
Fg PETEN park UR 

This procedure may break down right at the start if the coefficient f;; of x; in the 
first equation is zero. Even if fj; is not zero but very small, using the first equation to 
express x, in terms of the rest of the x's involves division by rj; and produces very 
large coefficients in formula (27),. This wouldn't matter if all arithmetic operations 
were carried out exactly, but they never are; they are carried out in finite digit 
floating point arithmetic, and when (27), is substituted in the remaining equations, 
the coefficients £j, i > 1, are swamped. 

A natural remedy is to choose another unknown, x;, for elimination and another 
equation to accomplish it, so chosen that ży is not small compared with the other 
coefficients. This strategy is called complete pivoting and is computationally 
expensive. A compromise is to keep the original order of the unknowns for 
elimination, but use another equation for elimination, for which 7; 1s not small 
compared to the other coefficients. This strategy, called partial pivoting, works very 
well in practice (see, e.g., the text entitled Numerical Linear Algebra, by Trefethen 
and Bau.) 


CHAPTER 5 


Determinant and Trace 


In this chapter we shall use the intuitive properties of volume to define the 
determinant of a square matrix. According to the precepts of elementary geometry, 
the concept of volume depends on the notions of length and angle and, in particular, 
perpendicularity, concepts that will be defined only in Chapter 8. Nevertheless, it 
turns out that volume is independent of all these things, except for an arbitrary 
multiplicative constant that can be fixed by specifying that the unit cube have 
volume one. 

We start with the geometric motivation and meaning of determinants. A simplex 
in R” is a polyhedron with n + 1 vertices. We shall take one of the vertices to be the 
origin and denote the rest as aj....,a,. The order in which the vertices are taken 
matters, so we call 0,a,.....a, the vertices of an ordered simplex. 

We shall be dealing with two geometrical attributes of ordered simplices, their 
orientation and volume. An ordered simplex 5 is called degenerate if it lies on an 
(n — 1)-dimensional subspace. 

An ordered simplex (0,a,.....04,) = Sthatis nondegenerate can have one of two 
orientations: positive or negative. We call $ positively oriented if it can be deformed 
continuously and nondegenerately into the standard ordered simplex (0. e1.....6,). 
where e; is the jth unit vector in the standard basis of R”. By such deformation we 
mean z vector-valued continuous functions a;(t) of 5,0 € t X 1, such that (i) 
S(t) = (0, aj (1), ...,a,(t)) is nondegenerate for all t and (ii) a;(0) = a;, a;(1) = e;. 
Otherwise 5 is called negatively oriented. 

For a nondegenerate oriented simplex S we define O(S) as +1 or — 1, depending 
on the orientation of 5, and zero when 5 1s degenerate. 

The volume of a simplex is given by the elementary formula 


| 
Vol(S) = - Vol,,_; (Base) Altitude. (1) 
H 
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By base we mean any of the (n — |)-dimensional faces of S, and by altitude we mean 
the distance of the opposite vertex from the hyperplane that contains the base. 
A more useful concept is signed volume, denoted as 5 (S), and defined by 


» (S) = O(S)Vol(S). (2) 


Since S is described by its vertices, » (S) is a function of aj... . , a,. Clearly, when 
two vertices are equal, 5 is degenerate, and therefore we have the following: 
(i) »^(S) =Oifaj=ay, jzk. 
A second property of 5 (S) is its dependence on a; when the other vertices are 
kept fixed: 
(ii) $ (S) is a linear function of a; when the other ag, k Æ j . are kept fixed. 


Let us see why we combine formulas (1) and (2) as 


| / 
» (S)— = Vols: (base)k, (1) 
where 
k = O(S)Altitude. 


The altitude is the distance of the vertex aj we call k the signed distance of the 
vertex from the hyperplane containing the base, because O(S) has one sign when a; 
lies on one side of the base and the opposite sign when aj lies on the opposite side. 

We claim that when the base is fixed, k is a linear function of aj. To see why this is 
so we introduce Cartesian coordinate axes so that first axis 1s perpendicular to the 
base and the rest lie in the base plane. By definition of Cartesian coordinates, the first 
coordinate K;(a) of a vector a is its signed distance from the hyperplane spanned by 
the other axes. According to Theorem | (i) in Chapter 2, kı (a) is a linear function of 
a. Assertion (ii) now follows from formula (1)'. 

Determinants are related to the signed volume of ordered simplices by the 
classical formula, 


* (s) = — D(ai... às), (3) 


where D is the abbreviation of the determinant whose columns are a,....,a,. Rather 
than start with a formula for the determinant, we shall deduce it from the properties 
forced on it by the geometric properties of signed volume. This approach to 
determinants is due to E. Artin. 


Property (i). D(a,..... aa) =0 ifa a, i<j. 
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Property (ii). D(aj....,a,) is a multilinear function of its arguments, in the 
sense that if all a; i Æj are fixed, D is a linear function of the remaining 
argument aj. 

Property (iii). Normalization: 

Diei,...,€,) — 1. (4) 
We show now that all remaining properties of D can be deduced from those so far 


postulated. 


Property (iv). D is an alternating function of its arguments, in the sense that if 
a; and a; are interchanged, i x j, the value of D changes by the factor (—1). 


Proof. Since only the ith and jth argument change, we shall indicate only these. 
Setting a; = a, a; = b we can write, using Properties (1) and (ii): 
D(a, b) = D(a,b) + D(a,a) = D(a,a + b) 
D(a,a - b) — D(a - b, a+b) 


= —D(b,a + b) = —D(b,a) — D(b,b) = —Dí(b.a). [1 
Property (v). If a,,..., a, are linearly dependent, then D(aj....,a,) = 0. 
Proof. If a1....,a, are linearly dependent, then one of them, say a), can be 


expressed as a linear combination of the others: 


ay = kada t+ ++ + Kaan. 
Then, using Property (ii), 


D(aj,...,a4) = D(Esao + +++ + Kurân, d5,..., dy) 
= kpD(a2,€2,...,4n) +++ + ky Dag, a2,- .., ap). 


By property (i), all terms in the last line are zero. E 


Next we introduce the concept of permutation. A permutation is a mapping p of n 
objects, say the numbers 1,2,...,n, onto themselves. Like all functions, 
permutations can be composed. Being onto, they are one-to-one and so can be 
inverted. Thus they form a group: these groups, except for m —2, are 
noncommutative. 

We denote p(k) as pz; it is convenient to display the action of p by a table: 


H 


Pn 
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Example 1. p = 5 Then 


2413 
> 1»234 _, 1234 
= : ) = 
j 2 f "34 
; 1234 1234 
i y — 
P3142 PO 1234 
Next we introduce the concept of signature of a permutation, denoted as a (p). Let 
X1... X4 be n variables; their discriminant is defined to be 
P(xi,...,35) = | [Qu — x) (5) 
i<j 


is either P(x4,...x,) or -P(xj,...,x,). 
Definition. The signature c(p) of a permutation p is defined by 
POUI dis w = pIpIPUtt ob (6) 


Properties of signature: 
(a) o(p)=+lor - I. 
(b) a(pi » p2) = o(pi)o(p2). 


EXERCISE I. Prove properties (7). 


We look now at a special kind of permutation, an interchange. These are defined 
for any pair of indices, j, k, J Æ k as follows: 


pli =i fori Æ jork, 
PU) =k, p(k) =j. 


such a permutation is called a transposition. We claim that transposition has the 
following properties: 
(c) The signature of a transposition ¢ is minus one: 


a(t) = —1. (8) 


(d) Every permutation p can be written as a composition of transpositions: 


P=: ot. (9) 
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EXERCISE 2. Prove (c) and (d) above. 
Combining (7) with (8) and (9) we get that 
o(p) = (-1)*. (10) 
where & is the number of factors in the decomposition (9) of p. 


EXERCISE 3. Show that the decomposition (9) 1s not unique, but that the parity 
of the member & of factors is unique. 


+ n * ^ * os 
Example 2. The permutation p — uA is the product of three transpositions 
n = 12345 — 12345 — 12345. 2 
| 7 325435 '2 — 21345: 73 ~ 42315" 
p= hohote 


We retum now to the function D. Its arguments a; are column vectors 


di 
aj| : |, Tel. (11) 
nj 
This is the same as 
i 
dj = a, je, +... + Anj€n. (11) 


Using Property (ii), multilinearity, we can write 


D(ai,...,05) = D(ayei +... + amen, @Q2;-++54n) 
= ay D(eia2,...,04) +... + Ant D(en, 05, ... ,,). (12) 
Next we express a» as a linear combination of e;,...,e, and obtain a formula like 


(12) but containing n? terms. Repeating this process n times we get 
Dt... uud) » ap 52: apn Defer), (13) 
f 


where the summation is over all functions f mapping 11.....7] into {1l,...,m}. 
If the mapping f is not a permutation, then f; = f; for some pair i j and by 
Property (1). 


D(ey,, .... eg) = 0. (14) 


This shows that in (13) we need sum only over those f that are permutations. 
We saw earlier that each permutation can be decomposed into K transpositions 
(9). According to Property (iv), a single transposition of its arguments changes the 
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value of D by a factor of (—1). Therefore k transpositions change it by the factor 
(—1)*. Thus, using (10), 


D(ep,,...,€p,) = e(p)D(ei...., en) (15) 


for any permutation. Setting (14) and (15) into (13) we get, after using the 
normalization (4), that 


D(a,,...,4) = $ a(p)ap c pun (16) 


This is the formula for D in terms of the components of its arguments. 
Formula (16) was derived using solely properties (i) (ii), and (in) of 
determinants. Therefore we conclude with the following theorem. 


Theorem 1. Properties (i), (1), and (111) uniquely determine the determinant as 
a function of a1,....d,. 


EXERCISE 4. Show that D defined by (16) has Properties (11). (i) and (iv). 


EXERCISE 5. Show that Property (iv) implies Property (1), unless the field K has 
characteristic two, that is, | + 1 = Q. 


Definition. Let A be an nxn matrix; denote its column vectors by 
Alesani Å ETT esas än). Its determinant, denoted as det A, is 


det A= D(a,.... ay), (17) 


where D is defined by formula (16). 


The determinant has properties (i)—(v) that have been derived and verified for the 
function D. We state now an additional important property. 


Theorem 2. For all pairs of n x n matrices A and B, 


det(BA) — det A det B. (18) 


Proof. According to equation (7) of Chapter 4, the jth column of BA is (BAJe;. 
The jth column a; of A is Ae;; therefore the jth column of BA is 


(BA)e; — BAe; = Baj. 
By definition (17), 


det(BA) = D(Ba,,..., Ban). (19) 
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We assume now that det B 0 and define the function C as follows: 


det(BA) 
C(aj,...,ü5) ^ ———. 20 
(a1,....42) = m (20) 
Using (19) we can express C as follows: 
D(Ba;....,Ba, 
C(ai, ... ag) _ D(Bai.... Ban) (20)' 


det B 


We claim that the function C has Properties (1)-(ii1) postulated for D. 

(i) If a; = aj, i 7 j, then Ba; = Baj; since D has Property (i), it follows that the 
right-hand side of (20y is zero. This shows that C also has Property (i). 

(ii) Since Ba; is a linear function of a;. and since D is a multilinear function, 
it follows that the right-hand side of (20) is also a multilinear function. This shows 
that C is a multilinear function of aj,...,«,, that is, has Property (ii). 

(iii) Setting a; = e;, i= 1,2.....n into formula (20), we get 


_ D(Bei,..., Ben) 


eee = - 21 
(ei c 1B (21) 
Now Be; is the ith column 5; of B, so that the right-hand side of (21) is 
D(b,,.... b,) 
mil sel, LE Bl J 
det B (22) 


By definition (17) applied to B, (22) equals 1; setting this into (21) we see that 
C(ej,....€,) = 1. This proves that C satisfies Property (111). 

We have shown in Theorem | that a function C that satisfies Properties (1)- (111) is 
equal to the function D. So 


CU dn) Dmm a ous = det A. 


Setting this into (20) proves (18), when det B + 0. 
When det B = 0 we argue as follows: define the matrix B(/) as 


B(t) = B +l. 


Clearly, B(0) = B. Formula (16) shows that D(B(1)) is a polynomial of degree n, and 
that the coefficient of t? equals one. Therefore, D(B(t)) is zero for no more than n 
values of f in particular D(B(f)) #0 for all ¢ near zero but not equal to zero. 
According to what we have already shown, det(B(1)A) = det A det B() for all such 
values of f; letting ¢ tend to zero yields (18). O 
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Corollary 3. An n x n matrix A is invertible iff det A + 0. 


Proof. Suppose A is not invertible; then its range is a proper subspace of R”. The 
range of A consists of all linear combinations of the columns of A; therefore the 
columns are linearly dependent. According to property (v), this implies that 
det A = 0. 

Suppose, on the other hand, that A is invertible; denote its inverse by B: 


BA = /. 
According to Theorem 2 


det B det A = det /. 
By property (iii), det / = 1; so, since D(/) = 1, 
det B det A = 1, 


which shows that det A Æ 0. L] 


The geometric meaning of the multiplicative property of determinants is this: the 
linear mapping B maps every simplex onto another simplex whose volume is |det B| 
times the volume of the original simplex. Since every open set is the union of 
simplices, it follows that the volume of the image under B of any open set is |det B| 
times the original volume. 


We turn now to yet another property of determinants. We need the following 
lemma. 


Lemma 4. Let A be an n x n matrix whose first column is ej: 


| x xx 

A : (23) 
> de | 
Q 


here Aj, denotes the (n — 1) x (n — 1) submatrix formed by entries aj, / > 1, j > 1. 
We claim that 


det A = det Aj). (24) 


Proof. As first step we show that 


| 0...0 
detA=det} 0 An |. (25) 
0 
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For it follows from Properties (1) and (ii) that if we alter a matrix by adding a 
multiple of one of its columns to another, the altered matrix has the same 
determinant as the original. Clearly, by adding suitable multiplies of the first column 


of A to the others we can turn it into the matrix on the right in (25). 
We regard now 


| 0 
Cu) = de ( j 2] 


as a function of the matrix A,,. Clearly it has Properties (i)- (iii). Therefore it must 
be equal to det A;;. Combining this with (25) gives (24). O 


EXERCISE 6. Verify that C(A,,) has properties (1)- (iii). 
Corollary 5. Let A be a matrix whose jth column is e;. Then 
det A = (Ty det Ajj, (25) 


where A; is the (n — 1) x (n — 1) matrix obtained by striking out the ith row and jth 
column of A: Aj is called the (ij)th minor of A. 


EXERCISE 7. Deduce Corollary 5 from Lemma 4. 


We deduce now the so-called Laplace expansion of a determinant according to its 
columns. 


Theorem 6. Let A be any n x n matrix and j any index between | and n. 
Then 


detA = 5» | (— 1)'"aj det Aj. (26) 


Proof. To simplify notation, we take j = 1. We write a, as a linear combination 
of standard unit vectors: 


ay = aye, c cc ges. 
Using multlinearity, we get 


det A — Dim, d.) = D(a1ei sl + anten, 42, ..., 04) 


= auiem... Qn) t c ay D(es,a5,...,0,). 


Using Corollary 5, we obtain (26). O 
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We show now how determinants can be used to express solutions of systems of 
equations of the form 


Ax = H, (27) 


A an invertible 7 x n matrix. Write 


X — , AXj€j. 


according to (7) of Chapter 4, Ae; = a;, the jth column of A. So (27) is equivalent to 
? xa; =H. (27) 
j 


We consider now the matrix A, obtained by replacing the kth column of A by u: 


Ap = (a4,...,04—1, My akhi . Qn) 


We form the determinant and use its multilinearity, 


det Ay = x det(a, dic], Aj, Op]... itla]. 
J 
Because of Property (1) of determinants, the only nonzero term on the right is the Ath, 
sO we gel 
det Ay = x, det A. 
Since A is invertible, det A # 0; so 


u det A, 
— detA ' 


xi (28) 


We use now the Laplace expansion of det A, according to its Ath column; we get 


detA, = » (—1)'" det Aju 


i 
and so, using (28), 
jp det Aj 
x, = M (-1)"* ccs Hi (29) 


i 
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This is called Cramer’s rule for finding the solution of the system of equations (27). 
We now translate (29) into matrix language. 


Theorem 7. The inverse matrix A ^! of an invertible matrix A has the form 


ig det Aj 


l = | — 
(A7), = (- D^ 7. 


(30) 
Proof. Since A is invertible, det A 4 0. A^! acts on the vector u; see formula (1) 
of Chapter 4, 


(Alu), = 9 (A qui. (31) 


Using (30) in (31) and comparing it to (29) we get that 
(A^!u), = Xk, Edd (32) 
that is, 
A^!u-— x. 


This shows that A~! as defined by (30) is indeed the inverse of A whose action is 
given in (27). [] 


We caution that reader that for n > 3, formula (30) is not a practical numerical 
method for inverting matrices. 


EXERCISE 8. Show that for any square matrix 


det A" — det A, A’ = transpose of A. (33) 


[Hint: Use formula (16) and show that for any permutation o(p) = oíp !).] 


EXERCISE 9. Given a permutation p of n objects, we define an associated so- 
called permutation matrix P as follows: 


QR if j = pii). 
d m fe otherwise. (34) 


Show that the action of P on any vector x performs the permutation p on the 
components of x. Show that if p, q are two permutations and P, Q are the associated 
permutation matrices, then the permutation matrix associated with peg is the product 


PQ. 
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The determinant is an important scalar-valued function of n x n matrices. 
Another equally important scalar-valued function is the trace. 


Definition. The trace of a square matrix A, denoted as tr A, is the sum of the 
entries on its diagonal: 


rA — Y'a. (35) 
i 
Theorem 8. (a) Trace is a linear function: 
trk A — Ktr A, tr(A +B) — trA 4 trB. 
(b) Trace is "commutative"; that is, 
ir(AB) = tr(BA) (36) 
for any pair of matrices. 


Proof. Linearity is obvious from definition (35). To prove part (b), we use the 
rule, [see (10)' of Chapter 4] for matrix multiplication: 


(AB); = V ' aibi 


k 
and 

(BA); = a bia. 

Å 
So 
tr(AB) = X inde: = S bina: = tr(BA) 

ik ik 

follows if one interchanges the names of the indices i, k. [] 


We recall from the end of Chapter 3 the notion of similarity. The matrix A is 
called similar to the matrix B if there is an invertible matrix S such that 


A = SBS". (37) 


We recall from Theorem 8 of Chapter 3 that similarity is an equivalence relation; 
that is, it is the following: 


(1) Reflexive: A is similar to itself. 
(ii) Symmetric: if A is similar to B, B is similar to A, 
(iii) Transitive: if A is similar to B, and B is similar to C, then A is similar to C. 
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Theorem 9. Similar matrices have the same determinat and the same trace. 


Proof. Using Theorem 2, we get from (37) 


det A = (det S)(det B)(det S^!) = (det B)(det S) det(S ^!) 
= det B det(SS ^!) = (det B) (det I) = det B. 


To show the second part we use Theorem 7(b): 
tr A = tr(SBS !) = tr((SB)S !) = tr(S~'(SB)) = trB. C] 


At the end of Chapter 4 we remarked that any linear map T of an n-dimensional 
linear space X into itself can, by choosing a basis in X, be represented as an n x n 
matrix. Two different representations, coming from two different choices 
of bases, are similar. In view of Theorem 9, we can define the determinant 
and trace of such a linear map T as the determinant and trace of a matrix 
representing T. 


EXERCISE IO. Let A be an m x n matrix. B an n x m matrix. Show that 


tr AB — tr BA. 


EXERCISE II. Let A be an n x n matrix, A^ its transpose. Show that 


tr AA! = p» a. 


The square root of the double sum on the right is called the Euclidean, or Hilbert- 
Schmidt, norm of the matrix A. 

[In Chapter 9, Theorem 4, we shall derive an interesting connection between 
determinant and trace. 


EXERCISE 12. Show that the determinant of the 2 x 2 matrix 


is D — ad — bc. 


EXERCISE 13. Show that the determinant of an upper triangular matrix, one 
whose elements are zero below the main diagonal, equals the product of its elements 
along the diagonal. 
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EXERCISE 14. How many multiplications does it take to evaluate det A by using 
Gaussian elimination to bring it into upper triangular from? 


EXERCISE 15. How many multiplications does it take to evaluate det A by 
formula (16)? 


EXERCISE 16. Show that the determinant of a (3 x 3) matrix 


a b c 
A-—[d e f 
g h i 


can be calculated as follows. Copy the first two columns of A as a fourth and fifth 
column: 


a c a b 
d e f d e 
g h i h 


det A = aei + bfg + cdh — gec — hfa — idb. 


Show that the sum of the products of the three entries along the dexter diagonals, 
minus the sum of the products of the three entries along the sinister diagonals is 
equal to the determinant of A. 


CHAPTER 6 


spectral Theory 


Spectral theory analyzes linear mappings of a space into itself by decomposing them 
into their basic constituents. We start by posing a problem originating in the stability 
of periodic motions and show how to solve it using spectral theory. 

We assume that the state of the system under study can be described by a finite 
number n of parameters; these we lump into a single vector x in R”. Second, we 
assume that the /aws governing the evolution in time of the system under study 
determine uniquely the state of the system at any future time if the initial state of the 
system is given. 

Denote by x the state of the system at time f = 0; its state at f= ] is then 
completely determined by x; we denote it as F(x). We assume F to be a differentiable 
function. We assume that the laws governing the evolution of the system are the 
same at all times: it follows then that if the state of the system at time f = 1 is z, its 
state at time ¢ = 2 is F(z). More generally, F relates the state of the system at time f to 
its state at f + 1. 

Assume that the motion starting at x = 0 is periodic with period one, that is that it 
returns to Ô at time z = 1. That means that 


F(0) — 0, (1) 


This periodic motion is called stable if, starting at any point A sufficiently close to 
zero, the motion tends to zero as f tends to infinity. 

The function F describing the motion is differentiable; therefore for small A, F(A) 
is accurately described by a linear approximation: 


F(A) = AR. (2) 
For purposes of this discussion we assume that F is à linear function 


F(h) = Ah, (3) 
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A an n x n matrix. The system starting at A will, after the elapse of N units of time, 
be in the position 


AN h. (4) 
In the next few pages we investigate such sequences, that is, of the form 
hh M VATES NS (5) 


First a few examples of how powers A" of matrices behave; we choose N — 1024, 
because then A^ can be evaluated performing ten squaring operations: 


Case (a) (b) 

(1 a ( 5 A) 
A 

| 4 —3 -4 
AU Sip cei 


These numerical experiments strongly suggest that 


(a) A" — oo as N — oc, 
(b) A" — 0 as N — ox, that is, each entry of A" tends to zero. 


We turn now to a theoretical analysis of the behavior of sequences of the form (5). 
Suppose that a vector A Æ 0 has the special property with respect to the matrix A that 
Ah is merely a multiple of A: 


Ah — ah, where a is a scalar and h Z O. (6) 


Then clearly 
A" h = a" h. (6), 
In this case the behavior of the sequence (5) is as follows: 


(i) If [a| > 1, AXA — oc. 
(ii) If jal! < 1, AXh — 0. 
(iii) If a — 1, AXh = h for all N. 


This simple analysis is applicable only if (6) is satisfied. A vector /i satisfying (6) 
is called an eigenvector of A; a is called an eigenvalue of A. 

How farfetched is it to assume that A has an eigenvector? We shall show that 
every n x n matrix over the field of complex numbers has an eigenvector. Choose 
any nonzero vector w and build the following set of n + | vectors: 


4 
w, Aw, À^w,..., A'W. 
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Since n + | vectors in the n-dimentional space C" are linearly dependent, there is a 
nontrivial linear relation between them: 


H 
cj Aw = 0, 
i) 


not all c; zero. We rewrite this relation as 


p(A)w = 0, (7) 


where p(t) is the polynomial 


p(t) = M ct. 
) 


Every polynomial over the complex numbers can be written as a product of linear 
factors: 


p(t) = c | [G —aj. c#0. 
p(A) can be similarly factored and (7) rewritten as 


C I [( — ajl)w = 0. 


This shows that the product L(A — a;I) maps the nonzero vector w into 0 and is 
therefore not invertible. According to. Theorem 4 of Chapter 3, a product of 
invertible mappings is invertible. It follows that at least one of the matrices A — ajl is 
not invertible; such a matrix has a nontrivial nullspace. Denote by /; any nonzero 
vector in the nullspace: 


(A — al)h = 0, a = dj. (6) 


This is our eigenvalue equation (6). 
The argument above shows that every matrix A has at least one eigenvalue, but it 
does not show how many or how to calculate them. Here is another approach. 
Equation (6) says that # belongs to the nullspace of (A — al); therefore the 
matrix A — al is not invertible. We saw in Corollary 3 of Chapter 5 that this can 
happen if and only if the determinant of the matrix A — al is zero: 


det(al — A) = 0. (8) 


50 equation (8) 1s necessary for a to be an eigenvalue of A. It is also sufficient; for 1f 
(8) is satisfied, the matrix A — al is not invertible. By Theorem | of Chapter 3 this 
noninvertible matrix has a nonzero nullvector h; (6Y shows that /: is an eigenvector 
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of A. When the determinant is expressed by formula (16) of Chapter 5, (8) appears as 


an algebraic equation of degree n for a, where A is an n x n matrix. The left-hand 
side of (8) is called the characteristic polynomial of the matrix A and is denoted as 


PA: 


Example 1 


G J 
A= 
1 4 
3—a 2 
det(A — al) — det = (3—a)(4—a)-2 
| 4—a 

=a — 7a 4 10 — 0. 

This equation has two roots, 


a, = 2, do = 5. 


These are eigenvalues; there is an eigenvector corresponding to each: 
] 2 
(A —a,Dhy = (i ;)h =0 


is satisfied by 


2 
hi = (3) 


and of course by any scalar multiple of hi. Similarly, 
—2 2 
(A — al)ha = | =j It» = () 


is satisfied by 


| 
-() 


and of course by any multiple of Az. 

The vectors A; and h» are not multiples of each other, so they are linearly 
independent. Thus any vector / in R? can be expressed as a linear combination of A, 
and ha: 


i= by fy + bha. (9) 
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We apply A" to (9) and use relation (6)y, 

AXh = biah i+ bray hy. (9). 
Since a, = 2,a. = 5, both a = 2N and a" = SN tend to infinity; since /i and A are 


linearly independent, it follows that also AA tends to infinity, unless both b; and b> 


are zero, in which case, by (9), 4 = 0. Thus we have shown that for A = E A and 


any A Æ 0, A Vh — oc as N — ox; that is, each component tends to infinity. This 
bears out our numerical result in case (a). In fact, A" ~ 5%. also borne out by the 
calculations. 


Example 2. Here is a more interesting case. The Fibonacci sequence fo, f1.... 
is defined by the recurrence relation 


fu = fa + hi (10) 
with the starting data fo = 0, f; = 1. The first ten terms of the sequence are 
0,1,1,2,3,5,8, 13,21,34; 


they seem to be growing rapidly. We shall construct a formula for f, that displays its 
rate of growth. We start by rewriting the recurrence relation (10) in matrix-vector 


form: 
O0 L\ffr-r\ {fa 
( He )-(&) (10) 


We deduce recursively that 


=A A= . L1 
f Fl fi . l | . l ) 
We shall represent the nth power of A in terms of its eigenvalues and eigenvectors. 


det(A — al) = aet( 7" i A =a —-a-l. 


The zeros of the characteristic polynomial of A are 


„ -1475 I- v5 
= i : 2 `’ 


Note that a; 1s positive and greater than 1, whereas az is negative and in absolute 
value much smaller than 1. 


SPECTRAL THEORY 63 


The eigenvectors satisfy the equations 


—| | i —d» | 7 
( l la nae ( l las) 7t 


These equations are easily solved by looking at the first component: 


"(s n) 


of course any scalar multiples of them are eigenvectors as well. 
Next we express the initial vector (fo. J1) = (0.1) as a linear combination of 


the eigenvectors: 
0 
( i) = Cih -t caha. 


Comparing the first component shows that c = —c;. The second component yields 


C1 = 1/5. So 


(b) dede 


Set this into (11); we get 


th u a a5 
= AY — (ħi — ħa) = — h — eo. 
Fa ys | — ha) Fen! Js 3 


The first component of this vector equation is 
fa = a? [N 5 — atj v5. 


Since at | J/5 is less than 1/2, and since f, is an integer, we can put this relation in 
the following form: 


" 
fa = nearest integer to 5 | 
EXERCISE 1. Calculate foo. 


We return now to the general case (6), (8). The characteristic polynomial of the 
matrix A, 


det(al — A) = pa(a), 


is a polynomial of degree n; the coefficient of the highest power a" is 1. 
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According to the fundamental theorem of algebra, a polynomial of degree n with 
complex coefficients has n complex roots; some of the roots may be multiple. The 
roots of the characteristic polynomial are the eigenvalues of A. To make sure that 
these polynomials have a full set of roots, the spectral theory of linear maps is 
formulated in linear spaces over the field of complex numbers. 


Theorem 1. Eigenvectors of a matrix A corresponding to distinct eigenvalues 
are linearly independent. 


Proof. Suppose a; Æ a, for i Æ k and 
Ah; = ajh;, hi x 0. (12) 


Suppose now that there were a nontrivial linear relation among the A;. There may be 
several: since all A; Æ 0, all involve at least two eigenvectors. Among them there is 
one which involves the /east number m of eigenvectors: 


n 


35h20,  bj£0,j-l....m (13) 
l 


here we have renumbered the /ij. Apply A to (13) and use (12); we get 


Multiply (13) by a,, and subtract from (13)': 


nm 


X (bia; — biam) h; = 0. (13) 


i 


Clearly the coefficient of A, is zero and none of the others is zero, so we have a linear 
relation among the A; involving only m — | of the vectors, contrary to m being the 
smallest number of vectors satisfying such a relation. LI 


Using Theorem | we deduce Theorem 2. 


Theorem 2. If the characteristic polynomial of the n x n matrix A has n distinct 
roots, then A has n linearly independent eigenvectors. 


In this case the n eigenvectors form a basis; therefore every vector h in C” can be 
expressed as a linear combination of the eigenvectors: 


h= bjh;. (14) 
l 
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Applying A" to (13) and using (6),, we get 
ANA = y ' bj? hy. (14) 


This formula can be used to answer the stability question raised at the beginning of 
this chapter: 


EXERCISE 2. (a) Prove that if A has n distinct eigenvalues a; and all of them are 
less than one in absolute value, then all 5 in C", 


A"h—^0  asN— oo, 


that is, all components of A"/i tend to zero. 
(b) Prove that if all a; are greater than one in absolute value, then for all h Æ 0, 


AP h — oo as N — oo, 


that is, some components of A" tend to infinity. 
There are two simple and useful relations between the eigenvalues of A and the 
matrix A itself. 


Theorem 3. Denote by «a;..... a, the eigenvalues of A, with the same 


multiplicity they have as roots of the characteristic equation of A. Then 


Soa; = tr A, ] [ai= det A. (15) 


Proof. We claim that the characteristic polynomial of A has the form 
pals) = s" — (trA)s ^! +---+4+(—1)" det A. (15)' 


According to elementary algebra, the polynomial p4 can be factored as 


" 


pals) = [ [t — ai); (16) 


l 


this shows that the coefficient of s" ' in pa is — ?.a;, and the constant term is 


(—1)"ILa;. Comparing this with (15)' gives (15). 
To prove (15)', we use first formula (16) in Chapter 5 for the determinant as a sum 
of products: 


S — ay] —(112 T —( |n 
-an $—» 
pals) = det(sl — A) = det 
—ap| r E 5 EE u 


= $ ctp) | | (sàj; — api). 
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Clearly the terms of degree n and n — | in s come from the single product of the 
diagonal elements. 


J [6s - ai) =” - (rA) + 


This identifies the terms of order n and (n — 1) in (15). The term of order zero, 
pa(O), is det (CA) = (—1)" det A. This proves (15) and completes the proof of 
Theorem 3. C] 


EXERCISE 3. (a) Verify for the matrices discussed in Examples | and 2, 


oi) mi) 


that the sum of the eigenvalues equals the trace, and their product is the determinant 
of the matrix. 

Relation (6),. A"/1 = a"h, shows that if a is an eigenvalue of A, a" is an 
eigenvalue of A". Now let g be any polynomial: 


q(s) = » ans”. 


Multiplying (6), by gy and summing we get 
q( A)h = q(a)M. (17) 


The following result is called the spectral mapping theorem. 

Theorem 4. (a) Let g be any polynomial, A a square matrix, a an eigenvalue of 
A. Then g(a) is an eigenvalue of q( A). 

(b) Every eigenvalue of g(A) is of the form g(a), where a is an eigenvalue of A. 

Proof. Part (a) is merely a verbalization of relation (17), which shows also that A 
and g( A) have 4 as common eigenvector. 


To prove (b). let b denote an eigenvalue of q(A); that means that g(A) — bI is not 
invertible. Now factor the polynomial g(s) — b: 


q(s) -b2c [6s — ri). 
We may set A in place of s: 


q(A) — bl = c | [(^ — ril). 
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By taking 6 to be an eigenvalue of g(A), the left-hand side is not invertible. 
Therefore neither is the right-hand side. Since the right-hand side is a product, it 


follows that at least one of the factors A — r;Iis not invertible. That means that some 
r; is an eigenvalue of A. Since r; is a root of q(s) — b, 


q(r;) = b. 


This completes the proof of part (b). [] 


If in particular we take q to be the characteristic polynomial p4 of A, we conclude 
that all eigenvalues of pA(A) are zero. In fact a little more is true. 


Theorem 5 (Cayley-Hamilton). Every matrix A satisfies its own characteristic 
equation: 


PACA) = 0. (18) 


Proof. If A has distinct eigenvalues, then according to Theorem 2 it has n linearly 
independent eigenvectors h; j= 1,...,n. Using (4) we apply pa(A): 


PACA) = ? pa(aj)bjh; = » 0 =0 


for all A, proving (18) in this case. For a proof that holds for all matrices we use the 
following lemma. 


Lemma 6. Let P and Q be two polynomials with matrix coefficients 


P()- 9 PS,  Q()-M Q. 
The product PQ — R is then 


R()—- Rs, R= $ PQ. 


jrk-l 
Suppose that the matrix A commutes with the coefficients of Q; then 
P(A)Q(A) = R(A). (19) 
The proof is selt-evident. 
We apply Lemma 6 to Q(s) = sI — A and P(s) defined as the matrix of cofactors 
of Q(s); that is, 


Pj(s) = (—1)' Dj(s), (20) 
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D; the determinant of the ijth minor of Q(s). According to the formula (30) of 
Chapter 5, 


P(s)Q(s) = det Q(s) I = pa(s)I, (21) 


where pa(s) is the characteristic polynomial of A. A commutes with the coefficients 
of Q: therefore by Lemma 6 we may set s = A in (21). Since Q(A) = 0, it follows 
that 


This proves Theorem 5. [] 
We are now ready to investigate matrices whose characteristic equation has 
muluple roots. First a few examples. 


Example 3. A= |, 
Pals) = det(sI -— I) = (5s EN 1)": 


| is an n-fold zero. In this case every nonzero vector /i is an eigenvector of A. 


Example 4. A=(°, a tr A = 2, det A = 1; therefore by Theorem 3, 


Ar 


pa(s) = s* — 2s 4- 1, 


whose roots are one, with multiplicity two. The equation 


_ [ 3h -2h5 V _ fly 
nies k^ - A z bs 


has as solution all vectors A whose components satisfy 
hy T h> = Ü. 


All these are multiples of A = i ). Soin this case A does not have two independent 
eigenvectors. 


We claim that if A has only one eigenvalue a and n linearly independent 
eigenvectors, then A = al. For in this case every vector in E" can be written as in 
(14), a linear combination of eigenvectors. Applying A to (14) and using a; = a for 
i= ],...,n gives that 


Ah = ah 


for all A; then A =al. We further note that every 2x 2 matrix A with 
tr A = 2, det A = 1 has | as a double root of its characteristic equation. These 
matrices form a two-parameter family; only one member of this family, A — I, has 
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two linearly independent eigenvectors. This shows that, in general, when the 
characteristic equation of A has multiple roots, we cannot expect A to have n linearly 
independent eigenvectors. 

To make up for this defect one turns to generalized eigenvectors. In the first 
instance a generalized eigenvector f is defined as satisfving 


(A — al)^f = 0. (22) 


We show first that these behave almost as simply under applications of A" as the 
genuine eigenvectors. We set 


(A — al) f =h. (23) 
Applying (A — al) to this and using (22), we get 

(A — aDA = 0, (23)' 
that is, # is a genuine eigenvector. We rewrite (23) and (23)' as 

Af — af +h, Ah = ah. (24) 
Applying A to the first equation of (24) and using the second equation gives 
A*f = aAf + Ah = a^ f + 2ah. 

Repeating this N times gives 


ANF = aN f + Na"! h. (25) 


EXERCISE 4. Verify (25) by induction on N. 
EXERCISE 5. Prove that for any polynomial q, 
q(A)f = q(a)f + q'(a)h, (26) 
where q’ is the derivative of g and f satisfies (22). 
Formula (25) shows that if |a| < 1, and f is a generalized eigenvector of A, 
Af — 0. 
We now generalize the notion of a generalized eigenvector. 
Definition. fis a generalized eigenvector of A, with eigenvalue a, if f # 0 and 
(A—al)"f =0 (27) 


for some positive integer m. 
We state now one of the principal results of linear algebra. 
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Theorem 7 (Spectral Theorem). Let A be an n x n matrix with complex 
entries. Every vector in C" can be written as a sum of eigenvectors of A, genuine or 
ceneralized. 


For the proof, we need the following results of algebra. 


Lemma 8. Let p and 4 be a pair of polynomials with complex coefficients and 
assume that p and g have no common zero. Then there are two other polynomials a 
and b such that 


ap + bq x 1. (28) 


Proof. Denote by all polynomials of the form ap 4- bg. Among them there is 
one, nonzero, of lowest degree; call it d. We claim that d divides both p and q; for 
suppose not; then the division algorithm yields a remainder r, say 


r — p — md. 


Since p and d belong to $, so does p — md = r; since r has lower degree than d, this 
is a contradiction. 

We claim that d has degree zero; for if it had degree greater than zero, it would, by 
the fundamental theorem of algebra, have a root. Since d divides p and q, this would 
be a common root of p and q. Since we have assumed the contrary, deg d — 0 
follows; since d Æ 0,d = const., say = 1. This proves (28). E 


Lemma 9. Let p and q be as in Lemma 8, and let A be a square matrix with 


complex entries. Denote by Np, Ng, and Np, the null spaces of p(A), g(A), and 
p(A)qA), respectively. Then N,, is the direct sum of N, and Ng: 


pq 
Nog = Np © Ns, (29) 
by which we mean that every x in Npg can be decomposed uniquely as 
X= Xp Xe, Xp in Np, xg in Ng. (29)' 
Proof. We replace the argument of the polynomials in (28) by A; we get 
a( A)p( A) + b(A)q(A) = I. (30) 


Letting both sides act on x we obtain 


a( A)p( Ax + b(LA)q( A)x = x. (31) 


SPECTRAL THEORY 71 


We claim that if x belongs to Npg, then the first term on the left in (31) is in N,, and 
the second in Np. To see this we use the commutativity of polynomials of the same 
matrix: 


q( AYa( A)p( A)x = a(A)p( A)q( A)x = 0, 


since x belongs to the nullspace of p( A)g(A). This proves that the first term on the 

left in (31) belongs to the nullspace of g(A); analogously the second term belongs to 

the nullspace of p(A). This shows that (31) gives the desired decomposition (29)'. 
To show that the decomposition is unique, we argue as follows: If 


x-X,d ox, m x. 4- x 
then 
: / / 
y = Xp — X, = X, — Xq 
is an element that belongs to both N, and N}. Let (30) act on y: 
a( A)p(A)y + b(A)q( A) y = y. 


Both terms on the left-hand side are zero: therefore so is the right-hand side, v. This 


F — J a = 
proves that x, = x. x, = Xa: O 


Corollary 10. Let p),.... p; be a collection of polynomials that are pairwise 


without a common zero. Denote the nullspace of the product p;(A) ... pl A) bv 
Np,- p- Then 


Npp, = Np, ©- O Np,- (32) 


EXERCISE 6. Prove (32) by induction on k. 
Proof of Theorem 7. Let x be any vector; the n + 1 vectors x, Ax, A7x, ... A"x 


must be linearly dependent; therefore there is a polynomial p of degree less than or 
equal to n such that 


p( A)x = 0 (33) 
We factor p and rewrite this as 


[i^ - «o ay 
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r; the roots of p, m; their multiplicity. When r; is not an eigenvalue of A, A — r;I is 
invertible; since the factors in (33)' commute, all invertible factors can be removed. 
The remaining 7; in (33)' are all eigenvalues of A. Denote 


p(s) = (s — rj)”; (34) 


then (33)' can be written as II p;(A)x = 0, that is, x belongs to Np,...p,. Clearly the p; 
pairwise have no common zero, so Corollary 10 applies: x can be decomposed as a 
sum of vectors in N,,. But by (34) and Definition (27), every x; in Np, is a generalized 
eigenvector. Thus we have a decomposition of x as a sum of generalized 
eigenvectors, as asserted in Theorem 7. LI 


We have shown earlier in Theorem 5, the Cayley—Hamiltonian Theorem, that the 
characteristic polynomial pa of A satisfies pA(A) = 0. We denote by =$ 4 the set 
of all polynomials p which satisfy p( A) = 0. Clearly, the sum of two polynomials in 
J belongs to J; furthermore, if p belongs to $, so does every multiple of p. Denote 
by m = ma a nonzero polynomial of smallest degree in $; we claim that all p in J 
are multiples of m. Because, if not, then the division process 


p=qm+r 


gives a remainder r of lower degree than m. Clearly, r = p — qm belongs to 5, 
contrary to the assumption that m is one of lowest degree. Except for a constant 
factor, which we fix so that the leading coefficient of ma is T, m = ma is unique. This 
polynomial is called the minimal polynomial of A. 

To describe precisely the minimal polynomial we return to the definition (27) of a 
generalized eigenvector. We denote by Nm = N,(a) the nullspace of (A — al)”. 
The subspaces M, consist of generalized eigenvectors; they are indexed 
increasingly, that is, 


Ni CN Css (35) 


Since these are subspaces of a finite-dimensional space, they must be equal from a 
certain index on. We denote by d — d(a) the smallest such index, that is, 


Na = ndy =+ tt (35) 
but 


Na-1 £ Na; (35)" 


d(a) is called the index of the eigenvalue a. 
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EXERCISE 7. Show that A maps Ny into itself. 


Theorem 1l. Let A be an n x n matrix: denote its distinct eigenvalues by 
d1,...,G,, and denote the index of a; by d;. We claim that the minimal polynomial 
ma is 


h 
ma(s) — TT — ap). 


EXERCISE 8. Prove Theorem 11. 


Let us denote Na (aj) by NU). then Theorem 7, the spectral theorem, can be 
formulated as follows: 


C" — NU! p NU) e. o NI, (36) 
The dimension of N' equals the multiplicity of a; as the root of the characteristic 


equation of A. Since our proof of this proposition uses calculus, we postpone it until 
Theorem 11 of Chapter 9. 

A maps each subspace NY? into itself; such subspaces are called invariant under 
A. We turn now to studying the action of A on each subspace; this action is 
completely described by the dimensions of Nj, N5,..., Ng in the following sense. 


Theorem 12. (i) Suppose the pair of matrices A and B are similar in the sense 
explained in Chapter 5 [see equation (37)], 


A = SBS™!, (37) 
S some invertible matrix. Then A and B have the same eigenvalues: 
a; = b,....ay = by: (38) 
furthermore, the nullspaces 
N,,(aj;) = nullspace of (A — aj) 
and 
M,,(a;) = nullspace of (B — a;l)” 
have for all j and m the same dimensions: 


dim N,,(a;) = dim M,,,(a;). (39) 
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(ii) Conversely, if A and B have the same eigenvalues, and if condition (39) about 
the nullspaces having the same dimension is satisfied, then A and B are similar. 


Proof. Part (i) is obvious; for if A and B are similar, so are A — al and B — al, 
and so is any power of them: 


(A — aD" = S(B — aD)" S~! (40) 


Since S is a I-to-1 mapping, the nullspaces of two similar matrices have the same 
dimension. Relations (39) and in particular (38), follow from the observation. 

The converse proposition will be proved in Appendix 15. 

Theorems 4, 7, and 12 are the basic facts of the spectral theory of matrices. We 
wish to point out that the concepts that enter these theorems—eigenvalue, 
eigenvector, generalized eigenvector, index—remain meaningful for any mapping 
A of any finite dimensional linear space X over C into itself. The three theorems 
remain true in this abstract context and so do the proofs. 

The usefulness of spectral theory in an abstract setting is shown in the following 
important generalization of Theorem 7. 


Theorem 14. Denote by X a finite-dimensional linear space over the complex 
numbers, by A and B linear maps of X into itself, which commute: 


AB — BA. (41) 


Then there is a basis in X which consists of eigenvectors and generalized 
eigenvectors of both A and B. 


Proof. According to the Spectral Theorem, Theorem 7, equation (36), X can be 
decomposed as a direct sum of generalized eigenspaces of A: 


X — NI a... eN”, 


NY) the nullspace of (A — a;I)^. We claim that B maps NY) into NU; for B is 
assumed to commute with A, and therefore commutes with (A — al)": 


B(A — al)^x = (A — al)“Bx. (42) 


If a is an eigenvalue and x belongs to NU. the left-hand side of (42) is 0; therefore so 
is the right-hand side, which proves that Bx is in NU. Now we apply the Spectral 
Theorem to the linear mapping B acting on NY? and obtain a spectral decomposition 
of each NU with respect to B. This proves Theorem 14. Q 


Corollary 15. Theorem 14 remains true if A, B are replaced by any number of 
pairwise commuting linear maps. 
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EXERCISE 9. Prove Corollary 15. 


In Chapter 3 we defined the transpose A’ of a linear map. When A is a matrix, that 
is, a map C" — C", its transpose A’ is obtained by interchanging the rows and 
columns of A. 


Theorem 16. Every square matrix A is similar to its transpose A’. 


Proof. We have shown in Chapter 3, Theorem 6, that a mapping A of a space X 
into itself, and its transpose A' mapping X' into itself, have nullspaces of the same 
dimension. Since the transpose of A — alis A’ — aT it follows that A and A’ have the 
same eigenvalues, and that their eigenspaces have the same dimension. 

The transpose of (A — al)! is (A' — al Y'; therefore their nullspaces have the same 
dimension. We can now appeal to Theorem 12 and conclude that A and A’, 
interpreted as matrices, are similar. [] 


Theorem 17. Let X be a finite-dimensional linear space over C, A a linear 
mapping of X into X. Denote by X' the dual of X, A’: X' — X' the transpose of A. Let 
a and b denote two distinct eigenvalues of A: a # b, x an eigenvector of A with 
eigenvalue a, / an eigenvector of A' with eigenvalue 5. Then / and x annihilate each 
other: 

(x)= 0. (43) 

Proof. The transpose of A is defined in equation (9) of Chapter 3 by requiring 
that for every x in X and every / in X' 

(A'I, x) = (I, Ax). 
If in particular we take x to be an eigenvector of A and / to be an eigenvector of A', 


Ax = ax, A= bl. 


and we deduce that 
b(l, x) = a(l, x). 
Since we have taken a x b, (1, x) must be zero. L 


Theorem 17 is useful in calculating and studying the properties of expansions of 
vectors x in terms of eigenvectors, 
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Theorem 18. Suppose the mapping A has 7 distinct eigenvalues aj,....d,. 


Denote the corresponding eigenvectors of A by x,...... Yn, those of A‘ by J ,..., dp. 
Then 

(ar (hx) Æ 0,i = 1,...,7. 

(b) Let 


x= 9 ka, (44) 
be the expansion of x as a sum of eigenvectors; then 
ki =< (Ii, x) / (li, xi), |= lan. (45) 


EXERCISE IO. Prove Theorem 18. 


EXERCISE II. Take the matrix 


from equation (10)' of Example 2. 


(a) Determine the eigenvector of its transpose. 


(b) Use formulas (44) and (45) to determine the expansion of the vector (0, 1)' 
in terms of the eigenvectors of the original matrix. Show that your answer 
agrees with the expansion obtained in Example 2. 


EXERCISE 12. In Example | we have determined the eigenvalues and 
corresponding eigenvector of the matrix 


3 2 
( i) 
as dj = 2, fi = (3 mda = 5m = a 


Determine eigenvectors /; and /» of its transpose and show that 


u 0 [fori X j 
(ish) =} Ze fori = j 


EXERCISE 13. Show that the matrix 


has | as an eigenvalue. What are the other two eigenvalues? 


CHAPTER 7 


Euclidean Structure 


In this chapter we abstract the concept of Euclidean distance. We gain no greater 
generality; we gain simplicity, transparency and flexibility. 

We review the basic structure of Euclidean spaces. We choose a point 0 as origin 
in real n-dimensional Euclidean space; the /ength of any vector x in space, denoted 
as || x ||, is defined as its distance to the origin. 

Let us introduce a Cartesian coordinate system and denote the Cartesian 
coordinates of x as x,,.... Xn. By repeated use of the Pythagorean theorem we can 


express the length of x in terms of its Cartesian coordinates. 


| x |= 
The scalar product of two vectors x and y, denoted as (x, y). is defined by 
(x,y) = En: (2) 
Clearly, the two concepts are related; we can express the length of a vector as 
I x | 52). (2) 
The scalar product is commutative: 
(x, y) = (y, x) (3) 
and bilinear: 


(x + u, y) = (x,y) + n y), 
(x, y+ v) = (xy) + (x, v). 
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Using these algebraic properties of scalar product we can derive the identity 
(x — y,x — y) = (5x) — 205 y) + Oy). 
Using (2), we can rewrite this identity as 
lx- y I? = lx I? 26») 9 HH y I^ (4) 


The term on the left is the distance of x from y, squared; the first and third terms 
on the right are the distances of x and y from 0, squared. These three quantities 
have geometric meaning; therefore they have the same value in any Cartesian 
coordinate system. If follows therefore from (4) that also the scalar product (2) has 
the same value in all Cartesian coordinate systems. By choosing special coordinate 
axes, the first one through x, the second so that y is contained in the plane spanned by 
the first two axes, we can uncover the geometric meaning of (x, v). 


lxi x 


The coordinates of the vector x and y in this coordinate system are 
x = (|| x |],0...0) and y = (|| y || cos@...). Therefore 


(x,y) = || x [Ill y |] cose, (5) 


0 the angle between x and y. 
The three points 0, x, y form a triangle whose sides are a — || x 
c = || x — y ||, forming an angle 8 at 0: 


„=|| y || 


Relations (4) and (5) can be written as 


c! =a’ + b — 2abcos0. (4)' 
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This is the classical law of cosine; a special case of it, 0 = 2/2, is the Pythagorean 
theorem. 

Most texts derive formula (5) for the scalar product from the law of cosine. This is 
a pedagogical blunder, for most students have long forgotten the law of cosine, if 
they ever knew it. 

We shall give now an abstract, that is axiomatic, definition of Euclidean space. 


Definition. A Euclidean structure in a linear space X over the reals is furnished 
by a real-valued function of two vector arguments called a scalar product and 


denoted as (x, y), which has the following properties: 


(i) (x, y) is a bilinear function; that is, it is a linear function of each argument 
when the other is kept fixed. 


(ii) It is symmetric: 
(x,y) = (x). (6) 
(iii) It is positive: 
(xx) D except for x = 0. (7) 
Note that the scalar product (2) satisfies these axioms. We shall show now that, 


conversely, all of Euclidean geometry is contained in these simple axioms. 
We define the Euclidean length (also called norm) of x by 


: 1/2 
| x |] = Gx). (8) 
A scalar product is also called an inner product, or a dot product. 


Definition. The distance of two vectors x and v in a linear space with Euclidean 
norm is defined as | x — y |l. 


Theorem 1 (Schwarz Inequality). For all x, v, 


X) < M x NU y I. (9) 
Proof. Consider the function g(t) of the real variable ¢ defined by 
g(t) =|| x + t LE (10) 
Using the definition (8) and properties (1) and (il) we can write 


[^ +- 2t(x, y) + t^ || y II^ (10)' 


X 


g(t) =| 
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Assume that y #0 and set t= —(x, y)/ || y | in (10). Since (10) shows that 
g(t) > 0 for all ¢, we get that 


> (x,y) 
| x JN . 12 20 
|» Il 
This proves (9). For y = 0, (9) is trivially true. LI 


Note that for the concrete scalar product (2), inequality (9) follows from the 
representation (5) of (x, y) as || x ||| y || cos. 


Theorem 2 


| x | = max(x, y), || y |] = 1. (11) 


EXERCISE I. Prove Theorem 2. 
Theorem 3 (Triangle Inequality). For all x, v 
lx+ yI s] xli 41». (12) 


Proof. Using the algebraic properties of scalar product, we derive, analogously to 
(4), the identity 


[x+y I^ = x I^ 265») + Hy IP (12) 
and estimate the middle term by the Schwarz inequality. LJ 
Motivated by (5) we make the following definitions. 


Definition. Two vectors x and y are called orthogonal (perpendicular), denoted 
as x L y, if 


(x, y) = O. (13) 


From (12) we deduce the Pythagorian theorem 


Il x» =|) xl? - I y IP if x L y. (13)' 


Definition. Let X be a finite-dimensional linear space with a Eulerian structure, 
x P, LL x a basis for X. This basis is called orthonormal with respect to a given 
Euclidean structure if 


U) (k) 0, for j#k, 
ai -1 for j = k. (14) 
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Theorem 4 (Gram-Schmidt). Given an arbitrary basis y\’,... , y\ in a finite- 
dimensional linear space equipped with a Euclidean structure, there is a related basis 
x0... x with the following properties: 


(i) x !,..., x" is an orthonormal basis. 
(ii) x’ is a linear combination of y'!’,...,y), for all k. 


Proof. We proceed recursively; suppose x'U,..., x^^ have already been 
constructed. We set 


k—1 


x8 = ef y — Y ca 


Since xU, ... , xU are already orthonormal, it is easy to see that x“) defined above 


is orthogonal to them if we choose 
c= (y^, x). i= ].....k — l. 
Finally we choose c so that || 3 || = 1. LJ 


Theorem 4 guarantees the existence of plenty of orthonormal bases. Given such a 
basis, any x can be written as 


" 


x= y ax (15) 


l 


Take the scalar product of (15) with xU. using the orthonormality relations (14) we 
get 


(x, 4^) = ay. (16) 


Let y be any other vector in X; it can be expressed as 


y= pe bx), 


Take the scalar product of y with x, using the expression (15). Then, using (14), we 
get 


(x,y) = So SC ajbu (x? x) = V ajb. (17) 


In particular, for y = x we get 


Ix I? = 9 ay. (17) 
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Equation (17) shows that the mapping defined by (16), 


x — (a1,..., 40), 


carries the space X with a Euclidean structure into R“, and carries the scalar product 
of X into the standard scalar product (2) of R”. 

Since the scalar product is bilinear, for y fixed (x, y) is a linear function of x. 
Conversely, we have the following theorem. 


Theorem 5. Every linear function /(x) on a finite-dimensional linear space X 
with Euclidean structure can be written in the form 


I(x) = (x,y), (18) 


v some element of X. 


Proof. Introduce an orthonormal basis xt”, .. . , x^ in X; denote the value of / on 
(Kk) 
X j by 


I(x) = by. 
Set 


yz bb. (19) 


It follows from orthonormality that (x), y) = by. This shows that (18) holds for 
x — x) k= 1,2.....n: but if two linear functions have the same value for all 
vectors that form a basis, they have the same value for all vectors x. LJ 


Corollary S. The mapping / — y is an isomorphism of the Euclidean space X 
with its dual. 


Definition. Let X be a finite-dimensional linear space with Euclidean structure, 
Y a subspace of X. The orthogonal complement of Y, denoted as Y+., consists of all 


vectors z in X that are orthogonal to every y in X: 


zin Y^ if (y,z) 2-0 for all y in Y. 
Recall that in Chapter 2 we denoted by Y^ the set of linear functionals that vanish 
on X. The notation Y^ introduced above is consistent with the previous notation when 


the dual of X is identified with X via (18). In particular, Y+ isa subspace of X. 


Theorem 6. For any subspace Y of X, 


X-Yeoy:. (20) 


EUCLIDEAN STRUCTURE 83 
The meaning of (20) is that every x in X can be decomposed uniquely as 
"E L . L , 5n 
x=yt+y, y in Y, y` orthogonal to Y. (20) 


Proof. We show first that a decomposition of form (20)' is unique. Suppose we 
could write 


xX-—z-Hz. zin Y.z^in Y+. 


F4 


Comparing this with (20)' gives 
y-z-zt-y-. 
It follows from this that y — z belongs both to Y and to Y+, and thus is orthogonal to 
itself: 
: 
0-(y-zz =y )=(y-z y- z) =] y- z|; 
but by positivity of norm, y — z = 0. 

To prove that a decomposition of form (20) is always possible, we construct an 
orthonormal basis of X whose first k members lie in Y; the rest must lie in Y ^. We can 
construct such a basis by starting with an orthonormal basis in Y, then complete it to a 
basis in X, and then orthonormalize the rest of the basis by the procedure described 
in Theorem 3. Then x can be decomposed as in (15). We break this decomposition 
into two parts: 


n k H 
y= 3 a;x = So T M —ycTy: (21) 
kl 
clearly, y lies in Y and y~ in Y+. O 


In the decomposition (20)', the component y is called the orthogonal projection of 
x into Y, denoted by 


y= Pyx. (22) 


Theorem 7. (i) The mapping Py is linear. 
(ii) P? = Py. 


Proof. Let w be any vector in X, unrelated to x, and let its decomposition (20) be 
w=24+2, zin ¥Y.z~in Y~ 
Adding this to (20)' gives 


xtw-(ytz)r(y tz) 


84 LINEAR ALGEBRA AND ITS APPLICATIONS 


the decomposition of x + w. This shows that Py(x + w) = Pyx + Pyw, Similarly, 
Py (kx) — kP yx. 

To show that P} = Py, we take any x and decompose it as in (20); x = y+ yt. 
The vector y = P, needs no further decomposition: Pyy = y. a 


Theorem 8. Let Y be a linear subspace of the Euclidean space X, x some vector 
in X. Then among all elements z of ¥ the one closest in Euclidean distance to x is 
Pyx. 


Proof. Using the decomposition (20) of x we have 
X—Z2—y—zrtY, y= pyx. 


Since y and z both belong to X, so does v — z. 
Therefore by the Pythagorean theorem (13), 


I Y} I 
Ix-zl^-l»-zl + lly u^ 


clearly this is smallest when z = y. Since the distance between two vectors x, z 1s 
|| x— z ||, this proves Theorem 8. [ ] 


We turn now to linear mappings of a Euclidean space X into another Euclidean 
space U. Since a Euclidean space can be identified in a natural way with its own 
dual, the transpose of a linear map A of such a space X into U maps U into X. To 
indicate this distinction, and for vet another reason explained at the end of this 
chapter, the transpose of a map A of Euclidean X into U is called the adjoint of A and 
is denoted by A’. 

Here is the full definition of the adjoint A" of a linear mapping A of a Euclidean 
space X into another Euclidean space U: 

Given any win U, 


I(x) = (Ax, u) 


is a linear function of x, According to Theorem 5, this linear function /(x) can be 
represented as (x, v), v in X. Therefore for all x in X 


(x, y) = (Ax, u). (23) 


The vector y depends on u; Since scalar products are bilinear, v depends linearly on 
u, we denote this dependence as y = A*u, and rewrite (23) as 


(x, A*u) = (Ax, u). (23) 
Note that A’ maps U into X; the parentheses on the left denote the scalar product in 


X. while those on the right denote the scalar product in U. 
The next theorem lists the basic properties of adjointness: 
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Theorem 9. (i) If A and B are linear mappings of X into U, then 
(A +B)" = A* +B". 
(ii) If A is a linear map of X into U, while C is a linear map of U into V, then 


(CA) = A'C'*. 


(iii) If A is a I-to-I mapping of X onto U, then 
(A! = (A^). 
(iv) (A*)' =A. 


Proof. (i) is an immediate consequence of (23)'; (ii) can be demonstrated in two 
Steps: 


(CAx, v) = (Ax, C*v) = (x, A'C'v). 


(iii) follows from (ii) applied to A^!A — 7, 7 the identity mapping, and the 
observation that /* = 7. (iv) follows if we use the symmetry of the scalar product to 
rewrite (23)' as 


When we take X to be R” and U to be E" with their standard Euclidean structures, 
and interpret A and A’ as matrices, they are transposes of each other. 

We present now an important application of the notion of the adjoint. 

There are many situations where quantities xj....,x, cannot be measured 
directly, but certain linear combinations of them, 


X] +++ anXn, 


can. Suppose that n such linear combinations have been measured. We can put all 
this information in the form of a matrix equation 


Ax = p, (24) 


where p;....,. Pm are the measured values, and A is an m x n matrix. We shall 
examine the case where the number m of measurements exceeds the number 7 of 
quantities whose value is of interest to us. Such a system of equations is 
overdetermined and in general does not have a solution. This is not as alarming as it 
sounds, because no measurement is perfect, and therefore none of the equations is 
expected to hold exactly. In such a situation, we seek that vector x that comes closest 
to satisfying all the equations in the sense that makes || Ax — p |^ as small as 


possible. 
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For such an x to be determined uniquely, A cannot have nonzero nullvectors. For 
if Ay = 0, and x a minimizer of || Ax — p ||, then so is x + ky, k any number. 


Theorem 10. Let A be an m x n matrix, m > n, and suppose that A has only the 
trivial nullvector 0. The vector x that minimizes || Ax — p ||? is the solution z of 


A" Az = A*p. (25) 


Proof. We show first that equation (25) has a unique solution. Since the range of 
35, à = 
A is X", (25) is a system of n equations for n unknowns. According to Corollary B 
in Chapter 3, a unique solution is guaranteed if the homogeneous equation 


A'Ay —0 (25) 


has only the trivial solution y = 0. To see that this is the case, take the scalar product 
of (25)' with y. We get, using the definition (23)' of adjointness, 0 = (A'Ay, y) = 
(Ay, Ay) =|| Ay |. Since || || is positive, it follows that Ay = 0. Since we have 
assumed that A has only the trivial nullspace, y = 0 follows. 

A maps R” into an n-dimensional subspace of R". Suppose z is a vector in R” 
with the following property: 

Az- p is orthogonal to the range of A. We claim that such à z minimizes 
| Ax — p ||?. To see this let x be any vector in R”; split it as x = z + y; then 


Ax —p = A(z + y) - p = Az— p + Ay. 


By hypothesis Az — p and Ay are orthogonal: therefore by the Pythagorean 
theorem, 


| Ax — p | =|] Az — p I^ + l| Ay 


this demonstrates the minimizing property of z. 
To find z, we write the condition imposed on z in the form 


(Az — p, Ay) 2-0 for all v. 
Using the adjoint of A we can rewrite this as 
(A'(Az —p),y) 20 for all y. 


The range of A* is R”, so for this condition to hold for all v, A' (Az — p) must be 0, 
which is equation (25) for z. L 


Theorem 11. An orthogonal projection Py defined in equation (22) is its own 
adjoint, 


Py = Py 
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EXERCISE 2. Prove Theorem 11. 


We turn now to the following question: what mappings M of a Euclidean space 
into itself preserve the distance of any pair of points, that is, satisfy for all x, y, 


| M(x) — M(y) || =|lx-y || ? (26) 


Such a mapping is called an isomerry. It is obvious from the definition that the 
composite of two isometries is an isometry. An elementary example of an isometry is 
translation: 


M(x) =x+ a, 


a some fixed vector. Given any isometry, one can compose it with a translation and 
produce an isometry that maps zero to zero. Converselv, any isometry is the 
composite of one that maps zero to zero and a translation. 


Theorem 12. Let M be an isometric mapping of a Euclidean space into itself 
that maps zero to zero: 


M(0) = 0. (27) 
(i) M is linear. 
(ii) M'M = I. (28) 
Conversely, if (28) is satisfied, M is an isometry. 
(ii) M is invertible and its inverse is an isometry. 
(iv) det M — +1. 
Proof. It follows from (26) with y = 0 and (27) that 


| M(x) |] =|] x Il. (29) 


Now let us abbreviate the action of M by": 


By (29), 
| x l9 bx d. |y = ly I. (29) 


By (26), 


Ix -y =llx-y I]. 
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Square and use expansion (4) on both sides: 
| x l^ — 26^») + EY WP ml x I 2605») + Iv IH 
Using (29), we conclude that 
(x,y) = (x,y); (30) 


that is, M preserves the scalar product. 
Let z be any other vector, z? = M(z); then, using (4) twice we get 


IZ-¥-yP=lZ7P+hly xxm 
- 2(Z,x) — 2(Z, y) + 2(x’, y). 
Similarly, 
| z—x—y lf =I zl? + lly IP + I x IP —2(x) — 2(2») + 26»). 
Using (29)' and (30) we deduce that 


i 


|z-x -y [P=Iz-x-y | 


We choose now z = x + y; then the right-hand side above is zero; therefore so is 
| z —x — y! |?. By positive definiteness of the norm 7 — x — »' = 0. This proves 
part (1) of Theorem 12. 

To prove part (ii), we take relation (30) and use the adjointness identity (23)': 


(Mx, My) = (x, M' My) = (x, y) 
for all x and y, so 
(x, M*My - y) = 0. 


Since this holds for all x, it follows that M* My — y is orthogonal to itself, and so, by 
positiveness of norm, that for all y, 


M'My — y = 0. 


The converse follows by reversing the steps: this proves part (11). 

It follows from (29) that the nullspace of M consists of the zero vector; it follows 
then from Corollary (B)' of Chapter 3 that M is invertible. That M`! is an isometry is 
obvious. This proves (iii). 

It was pointed out in equation (33) of Chapter 5 that for every matrix det M' = 
det M; it follows from (28) and the product rule for determinants [see (18) 
in Chapter 5| that (det M) — det I — I, which implies that 


det M = +1. (31) 


This proves part (iv) of Theorem 12. LJ 


EUCLIDEAN STRUCTURE 89 


The geometric meaning of (iv) is that a mapping that preserves distances also 
preserves volume. 


Definition. A matrix that maps R” onto itself isometrically is called orthogonal. 


The orthogonal matrices of a given order form a group under matrix 
multiplication. Clearly, composites of isometries are isometric, and so, by part 
(iii) of Theorem 12, are their inverses. 

The orthogonal matrices whose determinant is plus 1 form a subgroup, called the 
special orthogonal group. Examples of orthogonal matrices with determinant plus | 
in three-dimensional space are rotations; see Chapter 11. 


3. Construct the matrix representing reflection of points in R? across 
the plane x3 = 0. Show that the determinant of this matrix is — 1. 


EXERCISE 3 


EXERCISE 4. Let R be reflection across any plane in R°. 


(i) Show that R is an isometry. 
(ii) Show that R? = I. 
(iii) Show that R* = R. 


We recall from Chapter 4 that the ¿jth entry of the matrix product AB is the scalar 
product of the ith row of A with the jth column of B. The ith row of M' is the 
transpose of the ith column of M. Therefore the identity M'M = I characterizing 
orthogonal matrices can be formulated as follows: 


Corollary 12. A matrix M is orthogonal iff its columns are pairwise orthogonal 
unit vectors. 


EXERCISE 5. Show that a matrix M is orthogonal iff its rows are pairwise 
orthogonal unit vectors. 


How can we measure the size of a linear mapping A of one Euclidean space X into 
another Euclidean space U? Recall from a rigorous course on the foundations of 
calculus the concept of least upper bound, also called supremum, of a bounded set of 
real numbers, abbreviated as sup. Each component of Ax is a linear function of the 
components of x; || Ax |? is a quadratic function of the components of x, and 
therefore the set of numbers || Ax |?, || x ||? = 1 is a bounded set. 


Definition 


|| A |= = || Ax ||. (32) 


x || in X. || A || is called the norm of A. 


Note that || Ax || is measured in U, 
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Theorem 13. Let A be a linear mapping from the Euclidean space X into the 


Euclidean space U, where || A || is its norm. 


(i) |Azl SI All zi forall zin X. (33) 


(ii) | Al sup (Axv). (34) 


ix[— Led] 


Proof. (i) follows for unit vectors z from the definition (32) of || A |. For any 
z#0, write z= kx,x a unit vector; since || Akx || = || KAx || = |k| || Ax || and 
| Ax || = JA] || x |], (33) follows. For z = 0, (33) is obviously true. 

(ii) According to Theorem 2, 


= |. 


|| 4 || = max (u, v), || v 


Set Ax — u in definition (32), and we obtain (34). a 
EXERCISE 6. Show that |a,;| € || A ||. 
Theorem 14. For A as in Theorem 13, we have the following: 


(i) || KA || = |k] || A || for any scalar k. 
(ii) For any pair of linear mappings A and B of X into U, 


A+B || <I Al +] BI. (35) 


(iii) Let A be a linear mapping of X into U, and let C be a linear mapping of U 
into V. then 


| CA [€ I| C Ill] A Il. (36) 
(iv) | AX || =|] A II. (37) 
Proof. (i) follows from the observation that || KAx || = |Kk| || Ax ||. 


(ii) By the triangle inequality (12), for all x in X we obtain 


| (A + B)x || = || Ax + Bx [| < || Ax |] + || Bx ||. 
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The supremum of the left-hand side for || x || = 1 is || A + B ||. The right-hand side 
is a sum of two terms; the supremum of the sum is € the sum of the suprema, which 
is || A || + || B |). 
(iii) By inequality (33), 
| CAx || S |] € | Il] Ax |. 


Combined with (33), this yields 
| CAx || < || € [HH] A I x I 


Taking the supremum for all unit vectors x gives (36). 
(iv) According to (23)', 


since the scalar product is a symmetric function, we obtain 


(Ax, v) = (A'v,x). 


Take the supremum of both sides for all x and v, || x || = 1. || v || = 1. According to 
(34), on the left-hand side we get || A ||, and on the right-hand side we obtain 
| A^ |]. Li 


The following result is enormously useful: 


Theorem 15. Let A be a linear mapping of a finite-dimensional Euclidean 
space X into itself that is invertible. Denote by B another linear mapping of X into X 
close to A in the sense of the following inequality: 


| A-B || « t/|| A™ JJ. (38) 
Then B is invertible. 
Proof. Denote A — B = C, so that B = A — C. Factor B as 
B = A(I— A^!C) = A(I — S), 
where S = A !C. 
We have seen in Chapter 3 that the product of invertible maps is invertible; 
therefore it suffices to show that I — S is invertible. We see that it suffices to show 


that the nullspace of I — S is trivial. Suppose not; that is, (I — S)x = 0, x Æ 0. Then 
x = Sx; using the definition of the norm of S, 


Ia i= Sx sit S xi. 
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Since x Æ 0, it follows that 
I «J| S Jl. (39) 
But according to part (iii) of Theorem 14, 


I S| 2] Ante || «I A7! I C || « 1, 


where in the last step we have used inequality (38): || C | < 1/]| A ||}. This 
contradicts (39). E] 


Note. In this proof we have used the finite dimensionality of X. A proof of 
Theorem 15 given in Chapter 15 is valid for infinite-dimensional X. 

We recall now another concept from a rigorous calculus course: 

Convergence. A sequence of numbers {a,} tends to a, 


lim a, = a, 


if |a; — a| tends to zero. Recall furthermore the notion of a Cauchy sequence of 
numbers 1a, }; it is a sequence for which |a, — a;| tends to zero as j and k tend to oc. 
A basic property of real numbers is that every Cauchy sequence of numbers 
converges to a limit, 

This property of real numbers is called completeness. 

A second basic notion about real numbers is local compactness: Every bounded 
sequence of real numbers contains a convergent subsequence. 

We now show how to extend these notions and results from numbers to vectors in 
a finite-dimensional Euclidean space. 


Definition. A sequence of vectors {x,} in a linear space X with Euclidean 
structure converges to the limit x: 


lim x, =x 
kx 


if || x, — x || tends to zero as k — oc. 


Theorem 16. A sequence of vectors {x4} in a Euclidean space X is called a 
Cauchy sequence if || xy — x; || ^ 0 as k and j — oc. 

(i) Every Cauchy sequence in à finite-dimensional Euclidean space converges to 
a limit. 

A sequence of vectors 1x; in a Euclidean space X is called bounded if || xy |< R 
for all k, R some real number. 

(ii) In a finite-dimensional Euclidean space every bounded sequence contains a 
convergent subsequence. 
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Proof. (i) Let x and y be two vectors in X, a; and b; their jth component; then 
la; — bj| € || x — y |]. 


Denote by az; the jth component of xg. Since {xx} is a Cauchy sequence, it follows 
that the sequence of numbers {a,;} also is a Cauchy sequence. Since the real 
numbers are complete, the 1a, ;} converge to a limit aj. Denote by x the vector whose 
components are (aj,....,a,). From the definition of Euclidean norm, 


H 


| x —x |? = M jarj- aj. (40) 


it follows that lim x; — x. 
(ii) Since |a;;| < || x; ||. it follows that |a,;| € R for all k. Because the real 
numbers are locally compact, a subsequence of {az} converges to a limit aj. 
This subsequences of k — s contains a further subsubsequence such that 1a;5j 
converges to a limit a7. Proceeding in this fashion we can construct a subsequence of 


Lx] for which all sequences {a,j}, converge to a limit aj 1,...,, where n is the 
dimension of X. Denote by x the vector whose components are (dj,....a,). From 
(40) we deduce that the subsequence of [x;) converges to x. a 


It follows from part (11) of Theorem 16 that the supremum in the definition (32) of 
|| A || is a maximum: 


FA ees. Joe (32) 
It follows from the definition of supremum that || A || cannot be replaced by any 


smaller number that is an upper bound of || Ax ||, || x || = 1. It follows that there is a 
sequence of unit vectors {xz}, || x || = 1, such that 


lim | Ax [11A | 


According to Theorem 16, this sequence has a subsequence that converges to a limit 
x. This vector x maximizes || Az || for all unit vectors z. 
Part (ii) of Theorem 16 has a converse: 


Theorem 17. Let X be a linear space with a Euclidean structure, and suppose 
that it is locally compact—that is, that every bounded sequence {x,} of vectors in X 
has a convergent subsequence. Then X is finite dimensional. 


Proof. We shall show that if X is not finite dimensional, then it is not locally 
compact. Not being finite dimensional means that given any linearly independent set 
of vectors yj. ...., yg, there is a vector y; , that is not a linear combination of them. In 


this way we obtain an infinite sequence of vectors y;. y», .. . such that every finite set 
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lvi... yc) is linearly independent. According to Theorem 4, we can apply the 
Gram-Schmidt process to construct pairwise orthogonal unit vectors (xi,..., xij. 
which are linear combinations of y;..... yg. For this infinite sequence 
; E LP NA. NM 
| x — x; = xa^ 7209 45) + lx; ll —2 


for all & Æ j. Therefore this sequence, which is bounded, contains no convergent 
subsequence. L 


Theorem 17 is a very useful, and therefore important criterion for a Euclidean 
space to be finite dimensional. In Chapter 14 we shall show how to extend it to all 
normed linear spaces. 

In Appendix 12 we shall give an interesting application. 


Definition. A sequence {A,,} of mappings converges to a limit A if 
lim | A, - A || = 0. 
EXERCISE 7. Show that {A,,} converges to A iff for all x, A,x converges to Ax. 
Note. The result in Exercise 7 does not hold in infinite-dimensional spaces. 
We conclude this chapter by a brief discussion of complex Euclidean structure. In 


the concrete definition of complex Euclidean space, definition (2) of the scalar 
product in R” has to be replaced in C" by 


(x,y) = $ x, (41) 


where the bar denotes the complex conjugate. The definition of the adjoint of a 
matrix is as in (23)', but in the complex case has a slightly different interpretation. 
Writing 


A = (aj), (Ax), = >. djjX 


J 


and using the (41) definition of scalar product we can write 


( Ax. u) = a (X aj, 


i J 
This can be rewritten as 


Ys(3au) 


J 


which shows that (Ax, u) = (x, A*u), where 


(A*u); = 2. üiju;: 
I 
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that is, the adjoint A’ of the matrix A is the complex conjugate of the transpose of A. 
We now define the abstract notion of a complex Euclidean space. 


Definition. A complex Euclidean structure in a linear space X over the complex 
numbers is furnished by a complex valued function of two vector arguments, called a 
scalar product and denoted as (x, y), with these properties: 


(i) (x, y) is a linear function of x for v fixed. 
(ii) Conjugate symmetry: for all x, v, 


(x, y) = (y, x). (42) 


Note that conjugate symmetry implies that (x, x) is real for all x. 
(iii) Positivity: 


(x,x) > 0 for all x 4 0. 
The theory of complex Euclidean spaces is analogous to that for real ones, with a few 
changes where necessary. For example, it follows from (i) and (ii) that for x fixed, 


(x, y) is a skew linear function of y, that is, additive in y and satisfying for any 
complex number k, 


(x, ky) = k(x, v). (43) 


Instead of repeating the theory, we indicate those places where a slight change is 
needed. In the complex case identity (12) is 


= j x I Qs») Q3) Hy IF 
= || x |? + 2Re(x, y) + I| y IP, (44) 


| Xy 


where Re k denotes the real part of the complex number &. 


EXERCISE 8. Prove the Schwarz inequality for complex linear spaces with a 
Euclidean structure. 


EXERCISE 9. Prove the complex analogues of Theorems 6, 7, and 8. 


We define the adjoint A’ of a linear map A of an abstract complex Euclidean 
space into itself by relation (23)' as before: 


(x, A*u) = (Ax, u). 


EXERCISE IO. Prove the complex analogue of Theorem 9. 
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We define isometric maps of a complex Euclidean space as in the real case: 
|| Mx || = || x ||. 


Definition. A linear map of a complex Euclidean space into itself that is 
isometric is called unitary. 


EXERCISE 11. Show that a unitary map M satisfies the relations 
M'M= I (45) 
and, conversely, that every map M that satisfies (45) is unitary. 


EXERCISE 12. Show that if M is unitary, so is M^! and M”. 


EXERCISE 13. Show that the unitary maps form a group under multiplication. 


EXERCISE 14. Show that for a unitary map M, |det M| = 1. 


EXERCISE 15. Let X be the space of continuous complex-valued functions on 
|—1, I] and define the scalar product in X by 


l 
a= f FORA 


Let m(s) be a continuous function of absolute value 1: |m(s)| = 1,—1 € s < 1. 
Define M to be multiplication by zi: 


(Mf)(s) = m(s) f(s). 
Show that M is unitary. 
We give now a simple but useful lower bound for the norm of a matrix mapping a 


complex Euclidean space X into itself. The definition of the norm of such a matrix 15 
the same as in the real case, given by equation (32)': 


| A || = max || Ax |}. 
lx] 1 


Let A be any square matrix with complex entries, / one of its eigenvectors, 
chosen to have length |, and a the eigenvalue: 


Ah = ah, l| & |} = 1. 
Then 


| AA || =|] ah || = lal. 
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Since | A || is the maximum of || Ax || for all unit vectors x, it follows that 
|| A || > Ja]. This is true for every eigenvalue; therefore 


| A || > max lai]. (46) 
l 


where the a; range over all eigenvalues of A. 


Definition. The spectral radius r(A) of a linear mapping A of a linear space 
into itself is 


r(A) = max |a;|, (47) 
where the a; range over all eigenvalues of A. So (46) can be restated as follows: 
| A | > r(A). (48) 


Recall that the eigenvalues of the powers of A are the powers of the eigenvalues 
of A: 


A'h = ah. 
Applying (48) to A/, we conclude that 
| A || > r(AY. 
Taking the jth root gives 


| A’ 17 > r(A). (48) 
Theorem 18, As j tends to oc, (48); tends to be an equality; that is, 


lim || A’ ||'7— r(A). 
JX 


A proof will be furnished in Appendix 10. 
We shall give now a simple and useful upper bound for the norm of a real m x n 
matrix 


A = (aj). 


mapping R” into R”. For any x in R”, set Ax = y, y in R”. The components of y are 
expressed in terms of the components of x as follows: 


yi — , AijXj. 
j 
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Estimate the right-hand side using the Schwarz inequality: 


i- (ge) E) 


j 
adding all these inequalities, i = 1,...,m, we get 
3 «(X4)X s ao) 
j ij J 


Using the definition of norm in Euclidean space [see equation (1)], we can rewrite 
inequality (49) as 


Ly I< (34) Ixl? 


ij 


Take the square root of this inequality; since y = Ax, we can write it as 


laxis (Ze) 


The definition of the norm || A || of the matrix A is 


| /4 


| x I. (50) 


sup|Ax[. — [xl L 


It follows from (50) that 


i 


1/2 
J All < (Da) (51) 


Hj 
this is the upper bound for || A || we set out to prove. 


ExERCISE 16. Prove the following analogue of (51) for matrices with complex 
entries: 


1/2 
jals (Dla!) (51) 
Hj 


EXERCISE 17. Show that 


y aj = AA’, (52) 
Hj 
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EXERCISE 18. Show that 
tr AA* = r A*A. 


EXERCISE 19. Find an upper bound and a lower bound for the norm of the 2 x 2 


matrix 
| 2 
«(1 2) 


The quantity ( 57 laj| is called the Hilbert-Schimidt norm of the matrix A. 


ij 
Let T denote a 3 x 3 matrix, its columns x, v, and z: 


T = (x,y.z). 
The determinant of T is, for x and y fixed, a linear function of z: 
det (x,y,z) = I(z). (53) 


According to Theorem 5, every linear function can be represented as a scalar 
product: 


l(z) = (w,z), (54) 
where w is some vector depending on x and y: 
w = w(x, y). 
Combining (53) and (54) gives 
det (x, y, z) = (w(x, y). z). (55) 


We formulate the properties of the dependence of w on x and y as a series of 
exercises: 


EXERCISE 20. (i) w is a bilinear function of x and v. Therefore we write w as a 
product of x and y, denoted as 


W=XXY, 


and called the cross product. 
(ii) Show that the cross product is antisymmetric: 


yxx=—-x xX y. 
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(iii) Show that x x v is orthogonal to both x and y. 
(iv) Let R be a rotation in R°; show that 


(Rx) x (Ry) = R(x x y). 
(v) Show that 
l xx y | = || x Ill y || sin 8, 
where 8 is the angle between x and y. 
(vi) Show that 


| 0 
Ol x1 11lI^-^1[0 
0 | 


(vii) Using Exercise 16 in Chapter 5, show that 


a d bf — ce 
b|xlile|i-[cd-af 
C f ae — bd 


EXERCISE 21. Show that in a Euclidean space every pair of vector satisfies 


"-|u-vl*22] «I^ 21 v A (56) 


|| u + v 


CHAPTER 8 


Spectral Theory of Self-Adjoint 
Mappings of a Euclidean Space 
into Itself 


In this chapter we shall study mappings A of Euclidean spaces into themselves that 
are self-adjoint—that is, are their own adjoints: 


A" =A. 


When A acts on a real Euclidean space, any matrix representing it in an orthonormal 
system of coordinates is symmetric, that is, 


Aij = Aj. 


Such mappings are therefore also called symmetric. When A acts on a complex 
Euclidean space, its matrix representations are conjugate symmetric; 


Aj = Aji. 


Such mappings are also called Hermitean. We saw in Theorem 11 of Chapter 7 that 
orthogonal projections are self-adjoint. Below we describe another large class of 
self-adjoint matrices. In Chapter 11 we shall see that matrices that describe the 
motion of mechanical systems are self-adjoint. 


Definition. Let M be an arbitrary linear mapping in a Euclidean space, We 
define its self-adjoint part as 
O M+M 
= 


M, (1) 
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EXERCISE I. Show that 
Re(x, Mx) — (x, M,x). (2) 


Let (xj,....X,) — f(x) be a real-valued twice-differentiable function of n real 
variables x;,..... Ya written as a single vector variable x. The Taylor approximation 


to f at a up to second order reads 
l 
fla + y) — f(a) + Ky) + 54) + Ill ely), (3) 


where e(d) denotes some function that tends to 0 as d — 0, /(y) is a linear function of 
y, and g(y) is a quadratic function. A linear function has the form (see Theorem 5 of 
Chapter 7) 


l(y) = (v.g): (4) 


g is the gradient of f at a; according to Taylor's theorem 


of | 
x E |. 5 
5j OX; =u 
The quadratic function g has the form 
a(y) = » | hiyiy;- (6) 
ij 
The matrix (hy) is called the Hessian H of f: according to Taylor's theorem, 
m 
hj = ——— (7) 
OXjOX} |... 


Emploving matrix notation and the Euclidean scalar product, we can write q, given 
by (4), in the form 


q(x) = (y, Hy). (8) 
The matrix H is self-adjoint, that is, H' = H: 
hij = hii; (9) 
this follows from definition (5), and the fact that the mixed partials of a twice- 
differentiable function are equal. 
Suppose now that a is a critical point of the function f, that is where grad f = g 


is zero. Around such a point Taylor's formula (3) shows that the behavior of f is 
governed by the quadratic term. Now the behavior of functions near critical points 
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is of fundamental importance for dynamical systems, as well as in geometry: this is 
what gives quadratic functions such an important place in mathematics and makes 
the analvsis of symmetric matrices such a central topic in linear algebra. 

To study a quadratic function it is often useful to introduce new variables: 


Lv = 


- 


iJ 
-- 


(10) 
where L, is some invertible matrix, in terms of which g has a simpler form. 
Theorem 1. (a) Given a real quadratic form (6) it is possible to change 


variables as in (10) so that in terms of the new variables, z, q is diagonal, that is, of 
the form 


q(L^!z) = 5 da. (11) 
l 


(b) There are many ways to introduce new variables which diagonalize q; 
however, the number of positive, negative, and zero-diagonal terms d; appearing in 
(11) 1s the same in all of them. 

Proof. Part (a) is entirely elementary and constructive. Suppose that one of the 


diagonal elements of g is nonzero, say hyi Æ 0. We then group together all terms 
containing yj: 


if fl 
" 
qiy) = huy , hyyiy; + , hüyiyj. 
I I 


Since H is symmetric, hj = hy; so we can write q as 


2 
n n 
-| E | -| : | 
I MN T hy hiyi —hj, hijv; 
2 2 


set 
il 
yit hi X hy =z. (12) 
We can then write 
q(y) — hazi + q2 (7), (13) 
where q» depends only on y2,....¥n- 


If all diagonal terms of q are zero but there is some nonzero off-diagonal term, say 
hi; = ha Æ 0, then we introduce y; + y2 and y, — y» as new variables, which 
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produces a nonzero diagonal term. If all diagonal and off-diagonal terms are zero, 
then q(v) = 0 and there is nothing to prove. 

We now apply induction on the number of variables n; using (13) shows that if the 
quadratic function g2 in (n — |) variables can be written in form (11), then so can g 
itself. Since y2,...,¥, are related by an invertible matrix to z5,...,z,. it follows 
from (12) that the full set v is related to z by an invertible matrix. a 


EXERCISE 2. We have described above an algorithm for diagonalizing q; 
implement it as a computer program. 


We turn now to part (b); denote by p4., p_, and po the number of terms in (11) that 
are positive, negative, and zero, respectively. We shall look at the behavior of g on 


subspaces S of R”. We say that q is positive on the subspace 5 if 


q(u) > 0 for every u in S, u £0. (14) 


Lemma 2. The dimension of the largest subspace of R” on which q is positive 


IS p+: 
p, = max dim S, q positive on S. (15) 
Similarly, 


p- = max dim $, q negative on 5. (15) 


Proof. We shall use representation (11) for q in terms of the coordinates 


Zi. +, Zn; suppose we label them so that d),...,d, are positive, p = p+, the rest 
nonpositive. Define the subspace S, to consist of all vectors for which 
Zp = 7:7 = 2% = 0. Clearly dim S, = p,, and equally clearly, q is positive on 


5... This proves that p4. is less than or equal to the right-hand side of (15). We claim 
that the equality holds. Let S be any subspace whose dimension exceeds pi. 
For any vector u in $, define F, as the vector whose p+ components are the same as 
the first p} components of u, and the rest of the components are zero. The 
dimension p+ of the target space of this map is smaller than the dimension of the 
domain space 5$. Therefore, according to Corollary A of Theorem 2, Chapter 3, 
there is a nonzero vector v in the nullspace of P. By definition of P, the first p+ of the 
z-components of this vector v are zero. But then it follows from (11) that q(y) < 0; 
this shows that 4 is not positive on S. This proves (15); the proof of (15y is 
analogous. LJ 


Lemma 2 shows that the numbers p.. and p. can be defined in terms of the 
quadratic form g itself, intrinsically, and are therefore independent of the special 
choice of variables that puts g in form (11). Since p. + p- + po = n, this proves part 
(b) of Theorem 1. L| 
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Part (b) of Theorem 1 is called the /aw of inertia. 


EXERCISE 3. Prove that 
pa + po = max dim $, q > 0onS 
and 
p- + po = max dim 5, g = Üon 5. 


Using form (6) of q we can reinterpret Theorem | in matrix terms. It is convenient 
for this purpose to express y in terms of z, rather than the other way around as in (10). 
So we multiply (10) by L^, obtaining 


y = Mz, (16) 
where M abbreviates L`}. Setting (16) into (8) gives, using the adjoint of M, 
q(y) = (y, Hy) = (Mz, HMz) = (z, M°HMz). (17) 


Clearly, g in terms of z is of form (11) iff M' HM is a diagonal matrix. So part (a) of 
Theorem | can be put in the following form: 


Theorem 3. Given any real self-adjoint matrix H, there is a real invertible 
matrix M such that 


M*HM = D, (18) 


D a diagonal matrix. 


For many applications it is of utmost importance to change variables so that the 
Euclidean length of the old and the new variables is the same: 


2 2 
ly =i 


For the matrix M in (16) this means that M ts an isometry. According to (28) of 
Chapter 7, this is the case iff M is orthogonal, that is, satisfies 


M'M-I. (19) 


It is one of the basic theorems of linear algebra, nay, of mathematics itself, that given 
a real-valued quadratic form q. it is possible to diagonalize it by an isometric change 
of variables. In matrix language, given a real symmetric matrix H, there is a real 
invertible matrix M such that both (18) and (19) hold. 

We shall give two proofs of this important result. The first is based on the spectral 
theory of general matrices presented in Chapter 6, specialized to self-adjoint 
mappings in complex Euclidean space. 
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We recall from Chapter 7 that the adjoint H* of a linear map H of a complex 
Euclidean space X into itself is defined by requiring that 


(Hx, y) = (x, Hy) (20) 
hold for all pairs of vectors x.y. Here the bracket (.) is the conjugate-symmetric 
scalar product introducd at the end of Chapter 7. A linear map H is called 
self-adjoint if 

H' =H. 


For H self-adjoint, (20) becomes 


(Hx, y) = (x, Hy). (20) 

Theorem 4. A self-adjoint map H of complex Euclidean space X into itself has 
real eigenvalues and a set of eigenvectors that form an orthonormal basis of X. 

Proof. According to the principal result of spectral theory, Theorem 7 of Chapter 

6, the eigenvectors and generalized eigenvectors of H span X. To deduce Theorem 4 

Irom Theorem 7, we have to show that a self-adjoint mapping H has the following 


additional properties: 


(a) H has only real eigenvalues. 
(b) H has no generalized eigenvectors, only genuine ones. 


(c) Eigenvectors of H corresponding to different eigenvalues are orthogonal. 
(a) If a + ib is an eigenvalue of H, then ib is an eigenvalue of H — al, also self- 
adjoint. Therefore, it suffices to show that a self-adjoint H cannot have a purely 
imaginary eigenvalue ib. Suppose it did, with eigenvector z: 
Hz = ibz. 
Take the scalar product of both sides with z: 
(Hz, 2) = (ibz,z) = ib(z,z). (21) 
Setting both x and y equal to z in (20), we get 


(Hz,z) — (z, Hz). (21) 


Since the scalar product 1s conjugate symmetric, we conclude that the two sides of 
(21)' are conjugates. Since they are equal, the left-hand side of (21) is real. Therefore 
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so is the right-hand side; since (z, z) is positive, this can be only if b = 0, as asserted 


in (à). 
(b) A generalized eigenvector z satisfies 


H^z — 0; (22) 


here we have taken the eigenvalue to be zero, by replacing H with H — al. We want 
to show that then z is a genuine eigenvector: 


Hz = 0. (22) 


We take first the case d — 2: 


we take the scalar product of both sides with z: 
(Iz, z) = 0. (23y 
Using (20)' with x = Hz. y = z, we get 
(H^z, z) = (Hz, Hz) = || Hz ||’; 
using (23)', we conclude that || Hz || = 0, which, by positivity, holds only when 
pe now an induction on d; we rewrite (22) as 


H^H4^^?- -—NÜÓ 


Abbreviating H^^^z as w, we rewrite this as H^w = 0; this implies, as we have 
already shown, that Hw — 0. Using the definition of w this can be written as 


H^-!- -— (). 


This completes the inductive step and proves (b). 
(c) Consider two eigenvalues a and b of H, a Æ b: 


— 
* 


Hx = ux. H 


We form the scalar product of the first relation with y and of the second with x; since 
b is real we get 


(Hx,y)=a(x,y), — (Hy) = hix y). 
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By (20)' the left-hand sides are equal; therefore so are the right-hand sides. But for 
a Æ b this can be only if (x, v) = 0. This completes the proof of (c). O 


Definition. The set of eigenvalues of H is called the spectrum of H. 
We show now that Theorem 4 has the consequence that real quadratic forms can 


be diagonalized by real isometric transformation. Using the matrix formulation 
given in Theorem 3, we state the result as follows. 


Theorem 4'. Given any real self-adjoint matrix H, there is an orthogonal matrix 
M such that 


M'HM = D. (24) 
D a diagonal matrix whose entries are the eigenvalues of H. M satisfies M'M = I. 
Proof. The eigenvectors f of H satisfy 

Hf — af. (25) 

H is a real matrix, and according to (a), the eigenvalue a is real. It follows from (25) 
that the real and imaginary parts of f also are eigenvectors. It follows from this easily 
that we may choose an orthonormal basis consisting of real eigenvectors in each 
eigenspace Na. Since by (c), eigenvectors belonging to distinct eigenvalues are 


orthogonal, we have an orthonormal basis of X consisting of real eigenvectors f; of H. 
Every vector y in X can be expressed as a linear combination of these eigenvectors: 


y=) auf; (25) 


For y real, the z; are real. We denote the vector with components z; às z: 
z= (zi,..-,Z,). Since the {f} form an orthonormal basis, 


tm mz (26) 


Letting H act on (25), we get, using (25), that 


Hy = 5 yay. (25)" 


| y 


Setting (25) and (25)' into (6) we can express the quadratic form q as 
q(y) = (y, Hy) = 5 ajz. (26) 


This shows that the introduction of the new variables z diagonalizes the quadratic 
form g. Relation (26) says that the new vector has the same length as the old. 
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Denote by M the relation of z to v: 
y = Mz. 
Set this into (26)'; we get 
q(y) = (y, Hy) = (Mz, HMz) = (z, M'HMz;). 


Using (26)', we conclude that M*HM = D, as claimed in (24). This completes the 
proof of Theorem 4', [] 


Multiply (24) by M on the left and M' on the right. Since MM' also equals I for 
an isometry M, we get 


H = MDM’. (24) 
EXERCISE 4. Show that the columns of M are the eigenvectors of H. 


We restate now Theorem 4, the spectral theorem for self-adjoint maps, in a 
slightly different language. Theorem 4 asserts that the whole space X can be 
decomposed as the direct sum of pairwise orthogonal eigenspaces: 


X 2 NU og... En, (27) 


where NY? consists of eigenvectors of H with real eigenvalues a;,a; Æ a; for j zi. 
That means that each x in X can be decomposed uniquely as the sum 


xa Dp 4, [21 


where x^. belongs to NY’. Since NY! consists of eigenvectors, applying H to 
(27) gives 


Hx = ax +--+ aux (9, (28) 
Each x“? occurring in (27)' is a function of x; we denote this dependence as 
xU = P, (x). 


Since the VN? are linear subspaces of X, it follows that x depends linearly on x, that 
is, the P; are linear mappings. We can rewrite (27) and (28) as follows: 


I=) Pj (29) 


H= aj. (30) 
J 


110 LINEAR ALGEBRA AND ITS APPLICATIONS 
Claim: The operators P; have the following properties: 
(a) PP,—0  forj#k, P} =P, (31) 
(b) Each P; is self-adjoint: 
P; = P,. (32) 


Proof. (a) Relations (31) are immediate consequences of the definition of P;. 
(b) Using the expansion (27)' for x and the analogous one for y we get 


(Pix, y) — (xP y) — (so Y) = » 9,949 = (xU), yl), 


i i 


where in the last step we have used the orthogonality of NY? to x? for j 4 i. Similarly 
we can show that 


(x, Pry) = (x,y). 
Putting the two together shows that 
(Pjx, y) = (x, Piy). 


According to (20), this expresses the self-adjointness of P;. This proves 


(32). O 


We recall from Chapter 7 that a self-adjoint operator P which satisfies P^ = P is 
an orthogonal projection. A decomposition of the form (29), where the P; satisfy 
(31), is called a resolution of the identity. H in form (30) gives the spectral resolution 
of H. 

We can now restate Theorem 4 as 


Theorem 5. Let X be a complex Euclidean space, H: X — X a self-adjoint 
linear map. Then there is a resolution of the identity, in the sense of (29), (31), and 
(32) that gives a spectral resolution (30) of H. 


The restated form of the spectral theorem is very useful for defining functions of 
self-adjoint operators. We remark that its greatest importance is as the model for the 
infinite-dimensional version. 

Squaring relation (30) and using properties (31) of the P; we get 


H? = »» a; P;. 
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By induction, for any natural number m, 
H" = $ aP}. 
It follows that for any polynomial p. 
p(H) = > p(aj)Pj. (33) 


Let f(a) be any real valued function defined on the spectrum of H. We define f(H) by 
formula (33): 


f(H) = 5 f(aj)P;. (33) 
An example: 


e" = Ss ep. 


We shall say more about this in Chapter 9. 
We present a series of no-cost extensions of Theorem 5. 


Theorem 6. Suppose H and K are a pair of self-adjoint matrices that commute: 
H =H, K' = K. HK — KH. 


Then they have a common spectral resolution, that is, there exist orthogonal 
projections satisfying (29), (31), and (32) so that (30) holds, as well as 


S b; = K. (30)' 


Proof. Denote by N one of the eigenspaces of H; then for every x in N 
Hx = ax, 
Applying K. we get 
KHx = akKx. 
Since H and K commute, we can rewrite this as 
HKx = akKx. 
which shows that Kx is an eigenvector of H. So K maps N into itself. The restriction 


of K to N is self-adjoint. We now apply spectral resolution of K over N; combining 
all these resolutions gives the joint spectral resolution of H and K. [] 
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This result can be generalized to any finite collection of pairwise commuting 
self-adjoint mappings. 


Definition. A linear mapping A of Euclidean space into itself is called anti- 
self-adjoint it 


It follows from the definition of adjoint and the property of conjugate symmetry 
of the scalar product that for any linear map M of a complex Euclidean space into 
itself, 


(IM) = —iM'. (34) 


In particular, if A is anti-self-adjoint, /À 1s self-adjoint, and Theorem 4 applies. This 
yields Theorem 7. 


Theorem 7. Let A be an anti-self-adjoint mapping of a complex Euclidean 
space into itself. Then 


(a) The eigenvalues of A are purely imaginary. 
(b) We can choose an orthonormal basis consisting of eigenvectors of A. 


We introduce now a class of maps that includes self-adjoint, anti-self-adjoint, and 
unitary maps as special cases. 


Definition. A mapping N of a complex Euclidean space into itself is called 
normal if it commutes with its adjoint: 


NN* = N'N. 


Theorem 8. A normal map N has an orthonormal basis consisting of 
eigenvectors. 


Proof. If N and N* commute, so do 


N+N* N-N' 
=> and A — —7—. (35) 


H 


Clearly, H is adjoint and A is anti-self-adjoint. According to Theorem 6 applied to H 
and K = iA, they have a common spectral resolution, so that there is an orthonormal 
basis consisting of common eigenvectors of both H and A. But since by (35), 


N=H+A, (35) 


it follows that these are also eigenvectors of N as well as of N”. Lj 


SPECTRAL THEORY OF SELF-ADIOINT MAPPINGS OF A EUCLIDEAN SPACE 113 
Here is an application of Theorem 8. 


Theorem 9. Let U be a unitary map of a complex Euclidean space into itself, 
that is, an isometric linear map. 


(a) There is an orthonormal basis consisting of genuine eigenvectors of U. 
(b) The eigenvalues of U are complex numbers of absolute value = 1. 


Proof. According to equation (42) of Chapter 7, an isometric map U satisfies 
U'U = l. This relation says that U” is a left inverse for U. We have shown in Chapter 
3 (see Corollary B of Theorem 1 there) that a mapping that has a left inverse is 
invertible, and its left inverse is also its right inverse: UU" = I. These relations show 
that U commutes with U”; thus U is normal and Theorem 8 applies, proving part (a). 
To prove part (b), let f be an eigenvector of U, with eigenvalue u : Uf = uf. It 
follows that || Uf || ^ || uf || ^ |u| || f ||. Since U is isometric, |u| = 1. L 


Our first proof of the spectral resolution of self-adjoint mappings is based on the 
spectral resolution of general linear mappings. This necessitates the application of 
the fundamental theorem of algebra on the existence of complex roots, which then 
are shown to be real. The question is inescapable: Is it possible to prove the 
spectral resolution of self-adjoint mappings without resorting to the fundamental 
theorem of algebra? The answer is " Yes." The new proof, given below, is in every 
respect superior to the first proof. Not only does it avoid the fundamental theorem 
of algebra, but in the case of real symmetric mappings it avoids the use of complex 
numbers. It gives a variational characterization of eigenvalues that is very useful in 
estimating the location of eigenvalues; this will be exploited systematically in 
Chapter 10. Most important, the new proof can be carried over to infinite- 
dimensional spaces. 


Second Proof of Theorem 4. We start by assuming that X has an orthonormal 
basis of eigenvectors of H. We use the representations (26) and (26)' to write 


(x, Hx) Vaiz? 
=. 


(x Ye (36) 
We arrange the a; in increasing order: 
ay € a €: Say. (36) 
It is clear from (36) that choosing z; x 0 and all the other z;,i= 2,...,n = 0, 
makes (36) as small as possible. So 
a, = min ad LE (37) 


sz0 (x,x) 
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Similarly, 


x, H. / 
da = min Gr 2 (37) 
40 (x, x) 


The minimum and maximum, respectively, are taken on at points x — f that are 
eigenvectors of H with eigenvalues a, and a,,, respectively. 

We shall show now, without using the representation (36), that the minimum 
problem (37) has a solution and that this solution is an eigenvector of H. From this 
we shall deduce, by induction, that H has a full set of eigenvectors. 

The quotient (36) is called the Rayleigh quotient of H and is abbreviated by 
R = Ry. The numerator is abbreviated, see (6), as g; we shall denote the 
denominator by p, 


m q(x) _ (x, Hx) 
RW) p(x) (xx) 


Since H is self-adjoint, by (21)' R is real-valued; furthermore, R is a homogeneous 
function of x of degree zero, that is, for every scalar k, 


R(kx) = R(x). 


Therefore in seeking its maximum or minimum, it suffices to confine the search to 
the unit sphere || x || ^ 1. In Chapter 7, Theorem 15, we have shown that in a finite- 
dimensional Euclidean space X, every sequence of vectors on the unit sphere has a 
convergent subsequence. It follows that R(x) takes on its minimum at some point of 
the unit sphere; call this point f. Let e be any other vector and 1 be a real variable; 
R(f + tg) is the quotient of two quadratic functions of r. 

Using the self-adjointness of H and the conjugate symmetry of the scalar product 
we can express R(f + tg) as 


— (f, Hf) + 2tRe(g. Hf) + (gs Hg). q(t) 
"UTER (f.f)--2tRe(g, f)-- eleg) pÀ (38) 


Since R achieves its minimum at f, R(f + tg) achieves its minimum at f = 0; by 
calculus its derivative there is zero: 


d ee > qP — qp 
a RU + te) T » 
Since || f || = 1, p = 1; denoting R(f) = min R by a, we can rewrite the above as 


R —4-—ap- O0. (38)' 
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Using (38), we get readily 


q(f + tg)l,-o = 2Re(g, Hf). 
p(f + tg)|,-9 = 2Re(g, f). 


Setting this into (38)' yields 
2Re(g, Hf — af) = 0. 
Replacing g by ig we deduce that for all g in X, 
2(g, Hf — af) = 0. (39) 
A vector orthogonal to all vectors e is zero; since (39) holds for all g, it follows that 
Hf — af — 0, (39)' 


that is, fis an eigenvector and a is an eigenvalue of H. 

We prove now by induction on the dimension n of X that H has a complete set of n 
orthogonal eigenvectors in X. We consider the orthogonal complement X1, of f. that 
is, all x such that 


(x, f) = 0. (39) 


Clearly, dim X, = dim X — 1. We claim that H maps the space X, into itself; that is, 
if x € Xi, then (Hx, f) = 0. By self-adjointness and (39)", 


(Hx, f) = (x, Hf) = (x,af) = alx, f) — 0. 


H restricted to X, is self-adjoint: since dim X; =n — 1, induction on the 
dimension of the underlying space shows that H has a full set of eigenvectors on X. 
These together with f give a full set of n orthogonal eigenvectors of H on X. 
Instead of arguing by induction we can argue by recursion; we can pose the same 
minimum problem in X; that we have previously posed in the whole space, to 
minimize 


among all nonzero vectors in X4. Again this minimum value is taken on by some 
vector x = f» in Xj, and f» is an eigenvector of H. The corresponding eigenvalue 
iS a3: 


Hf) = af, 
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where a» is the second smallest eigenvalue of H. In this fashion we produce 
successively a full set of eigenvectors. Notice that the jth eigenvector goes with the 
jth eigenvalue arranged in increasing order. a 


In the argument sketched above, the successive eigenvalues, arranged in 
increasing order, are calculated through a sequence of restricted minimum 
problems. We give now a characterization of the jth eigenvalue that makes no 
reference to the eigenvectors belonging to the previous eigenvalues. This 
characterization is due to E. Fischer. 


Theorem 10. Let H be a real symmetric linear map of a real Euclidean space X 
of finite dimension, Denote the eigenvalues of H, arranged in increasing order, by 
G | yess Og. Then 


(x, Hx) 
a= mn max ————-, (40) 
dim S=/ xin S. x0 (x. X) 


S linear subspaces of X. 
Note. (40) is called the minmax principle. 


Proof. We shall show that for any linear subspace 5 of X of dim S = j. 


» aj. (41) 


To prove this it suffices to display a single vector x Æ 0 in 5 for which 


x, Hx 
nS > a). (42) 


Such an x is one that satisfies the j — 1 linear conditions 


(nf)20, i=1,...,7-1, (43) 


where f; is the ith eigenvector of H. It follows from Corollary A of Theorem | in 
Chapter 3 that every subspace 5 of dimension j has a nonzero vector x satisfying 
j— 1l linear conditions (43). The expansion (25) of such an x in terms of the 
eigenvectors of H contains no contribution from the first / — 1 eigenvectors; that is, 
in (36), z = 0 for i < j. It follows then from (36) that for such x, (42) holds. This 
completes the proof of (41). 

To complete the proof of Theorem 10 we have to exhibit a single subspace 5 of 
dimension j such that 


(44) 
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holds for all x in S. Such a subspace is the space spanned by f, . .., fj. Every x in this 
space is of form 57 zgj; since a; X aj for i<j, inequality (44) follows 
from (36). L 


The calculations and arguments presented above show an important property of 
the Rayleigh quotient: 


(i) Every eigenvector h of H is a critical point of Ry; that is, the first derivatives 
of Ru(x) are zero when x is an eigenvector of H. Conversely, the 
eigenvectors are the only critical points of Ry(x). 

(ii) The value of the Rayleigh quotient at an eigenvector f is the corresponding 
eigenvalue of H: 


Ru(f)2a when Hf=df. 


This observation has the following important consequence: 
suppose ¢ is an approximation of an eigenvector f within a deviation of e: 


| g—f ls e. (45) 


Then Ryle) is an approximation of the eigenvalue a within o(e7): 
IRu(g) — a| € O(e*). (45) 


This result is a direct consequence of the Taylor approximation of the function 
Ry(x) near the point x = f. 

The estimate (45)' is very useful for devising numerical methods to calculate the 
eigenvalues of matrices. 

We now give a useful extension of the variational characterization of the 
eigenvalues of a self-adjoint mapping. In a Euclidean space X, real or complex, we 
consider two self-adjoint mappings, H and M; we assume that the second one, M, is 
positive. 


Definition. A self-adjoint mapping M of a Euclidean space X into itself is called 
positive if for all nonzero x in X 


(x, Mx) > 0. 


It follows from the definition and properties of scalar product that the identity I is 
positive. There are many others; these will be studied systematically in Chapter 10. 
We now form a generalization of the Rayleigh quotient: 


— (x, Ax) 
Rp. Mx) = (x, Mx) . (46) 
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Note that when M = I, we are back at the old Rayleigh quotient. We now pose for 
the generalized Rayleigh quotient the same minimum problem that we posed before 
for the original Rayleigh quotient: Minimize Aj y(x), that is, find a nonzero vector x 
that solves 


(47) 


EXERCISE 5. (a) Show that the minimum problem (47) has a nonzero solution f. 
(b) Show that a solution f of the minimum problem (47) satisfies the equation 


Hf = bMf. (48) 


where the scalar b 1s the value of the minimum (47). 
(c) Show that the constrained minimum problem 


(y, Hy) 


| 47)’ 
OMA =0 (y, My) (47) 


has a nonzero solution g. 
(d) Show that a solution g of the minimum problem (47y satisfies the equation 


Hg = cMg, (48) 
where the scalar c is the value of the minimum (47). 
Theorem 11. Let X be a finite-dimensional Euclidean space, let H and M be 


two self-adjoint mappings of X into itself, and let M be positive. Then there exists a 
basis fi,.... f, of X where each f; satisfies an equation of the form 


Hf; = b;Mf;. b; real (49) 
and 


(f;, Mf;) = 0 fori x j. 


EXERCISE 6. Prove Theorem 11. 


EXERCISE 7. Characterize the numbers 5; in Theorem 11 by a minimax principle 
similar to (40). 
The following useful result is an immediate consequence of Theorem 11. 


Theorem 11'. Let H and M be self-adjoint, M positive. Then all the eigenvalues 
of M~'H are real. If H is positive, all eigenvalues of M~'H are positive. 
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EXERCISE 8. Prove Theorem 11'. 
EXERCISE 9. Give an example to show that Theorem I1’ is false if M is not 
positive. 


We recall from formula (32) of Chapter 7 the definition of the norm of a linear 
mapping A of a Euclidean space X into itself. 


| A || = max || Ax 


xa, 


When the mapping is normal, that is, commutes with its adjoint. we can express its 
norm as follows. 


Theorem 12. Suppose N is a normal mapping of a Euclidean space X into itself. 
Then 


| N || max |nj]. (50) 
where the n; are the eigenvalues of N. 
EXERCISE 10, Prove Theorem 12. (Hint: Use Theorem $8.) 


EXERCISE 11. We define the cyclic shift mapping S, acting on vectors in C", by 
S(ai TAERE an) = (Gp, Gps. An] ). 


(a) Prove that 5 is an isometry in the Euclidean norm. 
(b) Determine the eigenvalues and eigenvectors of S. 
(c) Verify that the eigenvectors are orthogonal. 


Remark, The expansion of a vector v in terms of the eigenvectors of S is called 
the finite Fourier transform of v. See Appendix 9. 


Theorem 13. Let A be a linear mapping of a finite-dimensional Euclidean 
space X into another finite-dimensional Euclidean space U. The norm || A || of A 
equals the square root of the largest eigenvalue of A*A. 


Proof. || Ax ||? = (Ax, Ax) = (x, A'Ax). According to the Schwarz inequality, 
the right-hand side is < || x || || A*Ax ||. It follows that for unit vectors x, || x ||= 1, 


| Ax |? < || A'Ax 


: (51) 
A*A is a self-adjoint mapping; according to formula (37y. we have 


par | A'Ax | = Amars 
A|| = 
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where dj. is the largest eigenvalue of A" A. Combining this with (50), we conclude 
that || A |? € ag. To show that equality holds, we note that for the eigenvector f of 
A‘ A, A' Af = dmaxf and so in the Schwarz inequality which gave (51), the sign of 
equality holds. E 


EXERCISE 12. (i) What is the norm of the matrix 


in the standard Euclidean structure? 
(ii) Compare the value of || A || with the upper and lower bounds of || A || asked 
for in Exercise 19 of Chapter 7. 


EXERCISE I3. What is the norm of the matrix 


I 0 —1 
2 3 0 


in the standard Euclidean structures of R? and Rè. 


CHAPTER 9 


Calculus of Vector- and 
Matrix- Valued Functions 


In Section | of this chapter we develop the calculus of vector- and matrix-valued 
functions. There are two ways of going about it: by representing vectors and 
matrices in terms of their components and entries with respect to some basis and 
using the calculus of number-valued functions or by redoing the theory in the context 
of linear spaces. Here we opt for the second approach, because of its simplicity and 
because it is the conceptual way to think about the subject; but we reserve the right to 
go to components when necessary. 

In what follows, the field of scalars is the real or complex numbers. In Chapter 7 
we defined the length of vectors and the norm of matrices; see (1) and (32). This 
made it possible to define convergence of sequences as follows. 


(i) A sequence x, of vectors in R” converges to the vector x if 


lim ||x, — x|| = 0. 
Ke 


(ii) A sequence A, of n x n matrices converges to A if 


lim ||A; — A|| = 0. 
k— x 


We could have defined convergence of sequences of vectors and matrices, 
without introducing the notion of size, by requiring that each component of x; tend 
to the corresponding component of x and, in the case of matrices, that each entry of 
A, tend to the corresponding entry of A. But using the notion of size introduces a 
simplification in notation and thinking, and is an aid in proof. There is more about 
size in Chapter 14 and 15. 
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1. THE CALCULUS OF VECTOR- AND MATRIX-VALUED 
FUNCTIONS 


Let x(t) be a vector-valued function of the real variable ¢, defined, say, for ¢ in (0, 1). 
We say that x(7) is continuous at to if 


lim ||x(1) — x(to)l| = 0. (1) 


We say that x is differentiable at to, with derivative X(to), if 


X(to + h) — x(to) 


— (tn 
h (1o) 


=o. ay 


h—) 


Here we have abbreviated the derivative by a dot: 


x(t) = © x(t) 


The notion of continuity and differentiability of matrix-valued functions is defined 
similarly. 

The fundamental lemma of differentiation holds for vector- and matrix-valued 
functions, 


Theorem 1. If i(7) = 0 for all rin (0, 1), then x(7) is constant. 


EXERCISE I. Prove the fundamental lemma for vector valued functions. (Hint: 
Show that for every vector y, (x(t). v) is constant.) 


We turn to the rules of differentiation. Linearity. (i) The sum of two 
differentiable functions is differentiable, and 


(ii) The constant multiple of a differentiable function is differentiable, and 


d d 
q Vx) — k(t). 


Similarly for matrix-valued differentiable functions, 


4 B(r). 


(iii) © (A(t) + B(f)) = “a(t +o 


dt 
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(iv) If A is independent of r, then we have 


d d 


The proof is the same as in scalar calculus. 

For vector- and matrix-valued functions there is a further manifestation of the 
linearity of the derivative: Suppose that / is a fixed linear function defined on R” and 
that x(f) is a differentiable vector-valued function. Then /(x(1)) is a differentiable 
function, and 


The same result applies to linear functions of matrices. In particular the trace, 
defined by (35) in Chapter 5, is such a linear function. So we have, for every 
differentiable matrix function A(1), that 


d d 
—tr(A(t)) = (Faw), (2) 
dt 
The rule (sometimes called the Leibniz rule) for differentiating a product is the 
same as in elementary calculus. Here, however, we have at least five kinds of 
products and therefore five versions of rules. 


Product Rules 


(i) The product of a scalar function and a vector function: 
d dk d 
EX k X = NETT X k ETT 1 T 
z (k(t) x(t) (=) e(t) + k(r) xC) 


(ii) The product of a matrix function times a vector function: 


d d d 
E7 [A(t}x(r)] = (Sao) + A(t) — x(t). 


di 


(iii) The product of two matrix-valued functions: 
d d d 
— |A(r)B(r0)| = |—A{t) | Bit) + A(t) |[—B(1})|. 
apei = [ao Bt) + Ac [mc 


(iv) The product of a scalar-valued and a matrix-valued function: 


d dk d 
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(v) The scalar product of two vector functions: 


d d d 
2.0.30) = (£0. 0) Qo. = atn). 


The proof of all these is the same as in the case of ordinary numerical functions. 
The rule for differentiating the inverse of a matrix function resembles the calculus 
rule for differentiating the reciprocal of a function. with one subtle twist. 


Theorem 2. Let A(r) be a matrix-valued function, differentiable and invertible. 
Then A ^! (t) also is differentiable, and 


—— — _- -—— : 3 
EAT A (Saya (3) 


Proof. The following identity is easily verified: 
A^! (t-- h) - A^! (r) 2 A7! (t -- DIA — A(t + AAT! (8) 
Dividing both sides by / and letting h — O yields (3). O 
EXERCISE 2. Derive formula (3) using product rule (i). 


The chain rule of calculus says that if f and a are scalar-valued differentiable 
functions, so is their composite, f(a(t)), and 


f a(t)) = TOF (4) 


where f" is the derivative of f. We show that the chain rule fails for matrix-valued 
functions. Take f(a) = a^; by the product rule, 


al al d 
ad = A—A A JA, 
dt T E 


certainly not the same as (4). More generally, we claim that for any positive integer 
power k, 


2s = AA‘! + AAA‘? +..-4+ ANA, (5) 


This is easily proved by induction: We write 


A* = AA! 
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and apply the product rule 


d P r " b 
— Af = AAP! 4 A Z AF 
dt dt 


Theorem 3. Let p be any polynomial, let A(r) be a square matrix-valued 
function that is differentiable; denote the derivative of A with respect to fas A. 


(a) If for a particular value of ¢ the matrices A(x) and A(t) commute, then the 
chain rule in form (4) holds as t: 


d 


17 = p (A)A. (6) 


(b) Even if A(r) and A(t) do not commute, a trace of the chain rule remains: 


P e p(A) = u(p'(A)A). (6) 


Proof. Suppose A and A commute; then (5) can be rewritten as 


"M - 
— A‘ = kA À. 
dt 


This is formula (6) for p(s) — s^; since all polynomials are linear combinations of 
powers, using the linearity of differentiation we deduce (6) for all polynomials. 


For noncommuting A and A we take the trace of (5). According to Theorem 6 of 
Chapter 5, trace is commutative: 


tr( A/AAF 7!) = tr(AS AA) = tr(A* T À). 


50 we deduce that 
d | k-i; 
tr— A^ = ktr(A" A). 
dt 


Since trace and differentiation commute [see (2)'], we deduce formula (6)' for 
p(s) — s*. The extension to arbitrary polynomials goes as before. O 


We extend now the product rule to multilinear functions M(a,,...,a,). Suppose 


t ee x, are differentiable vector functions. Then M(x;,...,x,) is differentiable, 
and 


—M{xı,. ae XK) = M(i),2%2,.. Xk) T'et M(x). uisi E ae (7) 
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The proof is straightforward: since M is multilinear, 


Mí(xi(t + h),.... x(t + h)) — M(ixi(t),... ox (Ir) 


= M(xi(t + A) — xilt) xo(t + Ah)... xlt + A)) 
+ M(xi(t),xa(t + A) — xo(t), xa(t + h),... xg (t + A)) 
Tct M(xi(t),... xii) xlt + h) — xit). 
Dividing by / and letting h tend to zero gives (7). 


The most important application of (7) 1s to the function D, the determinant, 
defined in Chapter 5: 


—Dí(xi,...,x5) = D(Xi,x»...., Xn) Fc D(xi,.. x4 X). (8) 


We now show how to recast this formula to involve a matrix X itself, not tts 
columns. We start with the case when X(0) = I, that is, xj(0) = e;. In this case the 
determinants on the right in (8) are easily evaluated at ¢ = 0: 


Die; TED, €n—1, X4 (0)) = Xnn(O). 


Setting this into (8) we deduce that if X(1) is a differentiable matrix-valued function 
and X(0) = I. then 


£ det X(t)|,-0 = trX(0). (8) 
dt 


Suppose Y(t) is a differentiable square matrix-valued function, which is 
invertible. We define X(t) as Y(0) ' Y(r), and write 


Y(t) = Y(0)X(1); (9) 


clearly, X(0) = I, so formula (8)' is applicable. Taking the determinant of (9), we get 
by the product rule for determinants that 


det Y(t) = det Y(0) det X(t). (9) 


Setting (9) and (9)' into (8)', we get 


det Y (0) © det Y(1)]- = tr[Y~'(0)Y(0)). 
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We can rewrite this as 


d ; 
z log det Y (r)|;=0 = tr[Y~'(t) Y¥(#)],_»- 
Since now there is nothing special about ¢ = 0, this relation holds for all r: 


Theorem 4. Let Y(r) be a differentiable square matrix-valued function. Then 
for those values of ¢ for which Y(t) is invertible. 


al al 
—logdet Y = tr| Y ! Y |. 10 
d 4" ( dt (10) 


The importance of this result lies in the connection it establishes between 
determinant and trace. 

So far we have defined f(A) for matrix arguments when f is a polynomial. We 
show now an example of a nonpolynomial f for which f( A) can be defined. We take 
f(s) = e, defined by the Taylor series 


2C 5! 
Fel (11) 


We claim that the Taylor series also serves to define e^ for any square matrix A: 


x) A^ ; 
PE (11) 


The proof of convergence is the same as in the scalar case: it boils down to showing 
that the difference of the partial sums tends to zero. That is, denote by e,,(A) the mth 
partial sum: 


"n A‘ 
eml A) = 2. zi (12) 
then 
n A* 
@m(A) — e(A) = 9. —. (13) 
Jel ™ 


Using the multiplicative and additive inequalities for the norm of matrices developed 
in Chapter 7, Theorem 14, we deduce that 


ii JAI" j 
lem(A) — e(A)Il < È , 5 -- (13) 
jl i 
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We are now back in the scalar case, and therefore can estimate the right-hand side 
and assert that as / and m tend to infinity, the right-hand side of (13) tends to zero, 
uniformly for all matrices whose norm ||A|| is less than any preassigned constant. 

The matrix exponential function has some but not all properties of the scalar 
exponential function. 


Theorem 5. (a) If A and B are commuting square matrices, 


A+B - AGB 


e e 


(b) If A and B do not commute, then in general 


e^ + e^, 


(c) If A(t) depends differentiably on t, so does e^". 
(d) If for a particular value of t, A(t) and A(t) commute, then (d/dt)e^ = eA. 
(e) If A is anti-self-adjoint, A* = —A, then e^ is unitary. 


Proof. Part (a) follows from the definition (11)' of e^*P, after (A +B)* is 
expressed as Y Y AJ B 7, valid for commuting variables. 

That commutativity is used essentially in the proof of part (a) makes part (b) 
plausible. We shall not make the statement more precise; we content ourselves with 
eiving a single example: 


(0). (023. 


It is easy to see that A^ = 0, B^ = 0, so by definition (11)', 


AC [1 1 Bo |. [) 90 
t -rea- (1 BE e =I+B= ah 


A brief calculation shows that 


since these products are different, at least one must differ from e**®; actually, 
both do. 


EXERCISE 3. Calculate 


A+B __ 0 l 
e = exp | o) 
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To prove (c) we rely on the following matrix analogue of an important property of 
differentiation: Let (E,,(1)) be a sequence of differentiable matrix-valued functions 
defined on an interval, with these properties: 


(i) E,(t) converges uniformly to a limit function E(t). 
(ii) The derivatives En (t) converge uniformly to a limit function F(t). 


Conclusion: E is differentiable, and E = F. 
EXERCISE 4. Prove the proposition stated in the Conclusion. 


We apply the same principle to E,,(7) = e,,(A(t)). We have already shown that 
E,,(1) tends uniformly to e^'?: a similar argument shows that E„(t) converges. 
EXERCISE 5. Carry out the details of the argument that E,,(1) converges. 


Part (d) of Theorem 5 follows from the explicit formula for (d/dt)e ^. obtained 
by differentiating the series (11)' termwise. 

To prove part (e) we start with the definition (11) of e^. Since forming the adjoint 
is a linear and continuous operation, we can take the adjoint of the infinite series in 
(11)' term by term: 


x ky Pay, 
er-E(j-xti-n-e 


0 


It follows, using part (a), that 
(e^)*e^ — e^ ^e^ — e? — I. 
According to formula (45) of Chapter 7. this shows that e^ is unitary. E 
EXERCISE 6. Apply formula (10) to Y(1) = e^' and show that 
dete^ — eA. 


EXERCISE 7. Prove that all eigenvalues of e^ are of the form e^, a an eigenvalue 
of A. Hint: Use Theorem 4 of Chapter 6, along with Theorem 6 below. 


We remind the reader that for se/f-adjoint matrices H we have already in 
Chapter 8 defined f(H) for a broad class of functions; see formula (33)'. 
2. SIMPLE EIGENVALUES OF A MATRIX 


In this section we shall study the manner in which the eigenvalues of à matrix 
depend on the matrix. We take the field of scalars to be C. 
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Theorem 6. The eigenvalues depend continuously on the matrix in the 
following sense: If {A,,} is a convergent sequence of square matrices, in the sense 
that all entries of A,, converge to the corresponding entry of A, then the set of 
eigenvalues of A, converges to the set of eigenvalues of A. That is, for every e > 0 
there is a k such that all eigenvalues of A, are, for m > k, contained in discs of radius 
€ centered at the eigenvalues of A. 


Proof. The eigenvalues of A,, are the roots of the characteristic polynomial 
Pas) = det(sI — An). Since An tends to A, all entries of Am tend to the 
corresponding entries of A; from this it follows that the coefficients of p,, tend to 
the coefficients of p. Since the roots of polynomials depend continuously on the 
coefficients, Theorem 6 follows. [] 


Next we investigate the differentiability of the dependence of the eigenvalues on 
the matrix. There are several ways of formulating such a result, for example, in the 
following theorem. 


Theorem 7. Let A(f) be a differentiable square matrix-valued function of the 
real variable t. Suppose that A(0) has an eigenvalue ao of multiplicity one, in the 
sense that ag is a simple root of the characteristic polynomial of A(0). Then for r 
small enough, A(t) has an eigenvalue a(r) that depends differentiably on t, and which 
equals ag at zero, that is, a(0) = ag. 


Proof. The characteristic polynomial of A(1) is 
det(sl — A(r)) = p(s, t), 


a polynomial of degree n in s whose coefficients are differentiable functions of t+. The 
assumption that ao is a simple root of A(0) means that 

o | 

p(ag. 0) = 0. — pls, 0)\,-4, 7 0. 

os 
According to the implicit function theorem, under these conditions the equation 
p(s, t) — 0 has a solution s= a(t) in a neighborhood of t= 0 that depends 
differentiably on f. L 


Next we show that under the same conditions as m Theorem 7, the eigenvector 
pertaining to the eigenvalue a(r) can be chosen to depend differentiably on t. We say 
"can be chosen” because an eigenvector is determined only up to a scalar factor; by 
inserting a scalar factor k(t) that is a nondifferentiable function of t we could, with 
malice aforethought, spoil differentiability (and even continuity). 


Theorem $. Let A(r) be a differentiable matrix-valued function of f, a(t) an 
eigenvalue of A(t) of multiplicity one. Then we can choose an eigenvector h(t) of 
A(t) pertaining to the eigenvalue a(t) to depend differentiably on f. 
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Proof. We need the following lemma. L] 


Lemma 9. Let A be an n x n matrix, p its characteristic polynomial, a some 
simple root of p. Then at least one of the (n — 1) x (n — 1) principal minors of 
À — al has nonzero determinant, where the jth principal minor is the matrix 
remaining when the ith row and ith column of A are removed. 


Proof. We may, at the cost of subtracting al from A, take the eigenvalue to be 
zero. The condition that 0 is a simple root of p(s) means that p(0) = 0; 
(dp/ds)(0) #0. To compute the derivative of p we denote by c;,....c, the 
columns of A, and by e;.....,e, the unit vectors. Then 


sl—A = (se) — C1, S62 — €2,..., SEn — Ch). 


Now we use formula (8) for the derivative of a determinant: 


dp d 
z © = 7, det(sl — A)|s=0 


= det(e;, —c2,..., —Cy) - -:: + det(—604, —c2,..., c4 1, ĉn). 


Using Lemma 2 of Chapter 5 for the determinants on the right-hand side we see that 
(dpíds) (0) is (—1)" ' times the sum of the determinants of the (n — 1) x (n — 1) 
principal minors. Since (dp/ds)(0) Æ 0, at least one of the determinants of these 
principal minors is nonzero. a 


Let A be a matrix as in Lemma 9 and take the eigenvalue a to be zero. Then one of 
the principal (n — 1) x (n — 1) minors of A, say the ith, has nonzero determinant. 
We claim that the ith component of an eigenvector h of A pertaining to the 
eigenvalue a is nonzero. Suppose it were denote by /:” the vector obtained from A by 
omitting the ith component, and by Aji the ith principal minor of A. Then 4” 
satisfies 


Ah = 0. (14) 


Since Ay has determinant not equal to 0, A; is, according to Theorem 5 of Chapter 5, 
invertible. But then according to (14), h/? = 0. If the ith component were zero, that 
would make /1 = 0, a contradiction, since an eigenvector Is not equal to 0. Having 
shown that the ith component of h is not equal to 0, we set it equal to ] as a way of 
normalizing A. For the remaining components we have now an inhomogeneous 
system of equations: 


Aj hU nE cl), (14)' 


where c" is —1 times the ith column of A, with the ith component removed. So 


h® = Azle (15) 


132 LINEAR ALGEBRA AND ITS APPLICATIONS 


The matrix A(0) and the eigenvalue a(0) of Theorem 8 satisfy the hypothesis of 
Lemma 9. Then a matrix A;(0) is invertible; since A(t) depends continuously on z, it 
follows from Theorem 6 that A;;(t) — a(r)I is invertible for : small: for such small 
values of t we set the ith component of A(t) equal to 1, and determine the rest of h by 
formula (15): 


h'(r) = Ag (t) c(t). (16) 


Since all terms on the right depend differentiably on f, so does /i'(1). This concludes 
the proof of Theorem 8. L 


We now extend Lemma 9 to the case when the characteristic polynomial has 
multiple roots and prove the following results. 


Lemma 10. Let A be ann x n matrix, p its characteristic polynomial. Let a be 
some root of p of multiplicity k. Then the nullspace of (A — al) is at most k- 
dimensional. 


Proof. We may, without loss of generality, take a = 0. That O is a root of 
multiplicity & means that 


d! dé 


Proceeding as in the proof of Lemma 9, that is, differentiating k times det(sI — A), 
we can express the kth derivative of p at O as a sum of determinants of principal 
minors of order (n — k) x (n — k). Since the kth derivative is not equal to 0, it 
follows that at least one of these determinants is nonzero, say the minor obtained by 
removing from A the ith rows and columns, i = 1,...,K. Denote this minor as A“, 
We claim that the nullspace N of A contains no vector other than zero whose first k 
components are all zero. For, suppose A is such a vector; denote by A^ the vector 
obtained from h by removing the first k components. Since Ah = 0, this shortened 
vector satisfies the equation 


AURI — 0, (17) 


Since det A' 4 0, A is invertible; therefore it follows from (17) that A% = 0. 
Since the components that were removed are zero, it follows that h= 0, a 
contradiction, 

It follows now that dim N < k; for, if the dimension of N were greater than K, it 
would follow from Corollary A of Theorem 1 in Chapter 3 that the k linear 
conditions A = O,...,4; = 0 are satisfied by some nonzero vector h in N. Having 


just shown that no nonzero vector / in N satisfies these conditions, we conclude that 
dim N < k. E] 


Lemma 10 can be used to prove Theorem 11, announced in Chapter 6. 
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Theorem 11. Let A be anv x n matrix, p its characteristic polynomial, a some 
root of p of multiplicity k. The dimension of the space of generalized eigenvectors of 
A pertaining to the eigenvalue a 1s K. 


Proof. We saw in Chapter 6 that the space of generalized eigenvectors is the 
nullspace of (A — al)‘, where d is the index of the eigenvalue a. We take a = 0. The 
characteristic polynomial py of A“ can be expressed in terms of the characteristic 
polynomial p of A as follows: 


d— 
sI- A" = | [ G'"*1— A), 


ü 


where w is a primitive dth root of unity. Taking determinants and using the 
multiplicative property of determinants we get 


d— 
pa(s) = det(sI — A4) = I] det(s!/41 — w/A) 
0 
d— | d—| (18) 
= x | | delos I- A) = +] | pos"). 
0 Ü 


Since a = 0 is a root of p of multiplicity k, it follows that 
oct 
p(s) ~ const. s 
as s tends to zero. It follows from (18) that as s tends to zero, 
+ of 
pals) ~ const. s^; 


therefore py also has a root of multiplicity k at O. It follows then from Lemma 10 that 
the nullspace of A^ is at most k dimensional. 

To show that equality holds, we argue as follows. Denote the roots of p as 
0,,...,4; and their multiplicities as k),...,4;. Since p is a polynomial of degree n, 
according to the fundamental theorem of algebra, 


»* 5 =N. (19) 


Denote by N; the space of generalized eigenvectors of A pertaining to the eigenvalue 
a;. According to Theorem 7, the spectral theorem, of Chapter 6, every vector can be 
decomposed as a sum of generalized eigenvectors: C” = Ni &--- © Nj. It follows 
that 


n=% dim Nj. (20) 
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Nj is the nullspace of (A — a;1)^: we have already shown that 
dim N; € k;. (21) 
Setting this into (20), we obtain 
n< » K;. 


Comparing this with (19), we conclude that in all inequalities (21) the sign of 
equality holds. L 


We show next how to actually calculate the derivative of the eigenvalue a(r) and 
the eigenvector A(t) of a matrix function A(t) when a(f) is a simple root of the 
characteristic polynomial of A(ż). We start with the eigenvector equation 

Ah = ah. (22) 
We have seen in Chapter 5 that the transpose A^ of a matrix A has the same 
determinant as A. It follows that A and A’ have the same characteristic polynomial. 
Therefore if a is an eigenvalue of A, it is also an eigenvalue of A’: 

A'1 — al. (227 
Since a is a simple root of the characteristic polynomial of A”, by Theorem 11 the 
space of eigenvectors satisfying (22) is one dimensional, and there are no 


generalized eigenvectors. 
Now differentiate (22) with respect to t: 


Ah + Ah = ah + ah. (23) 

Let ] act on (23): 
(1, Ah) + (1, Ah) = à(l, h) + all, h). (23)’ 
We use now the definition of the transpose, equation (9) of Chapter 3, to rewrite the 


second term on the left as (ATI, h). Using equation (22), we can rewrite this as 
all, h}, the same as the second term on the right; after cancellation we are left with 


(1, Ah) = à(L. h). (24) 


We claim that (/,/) + 0, so that (24) can be used to determine à. Suppose on the 
contrary that (Z, h) = 0; we claim that then the equation 


(A! -alm 2 I (25) 
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would have a solution m. To see this we appeal to Theorem 2' of Chapter 3, 
according to which the range of T = A" — al consists of those vectors which are 
annihilated by the vectors in the nullspace of T” = A — al. These are the 
eigenvectors of A and are multiples of h. Therefore if (/. h) = 0, / would satisfy 
the criterion of belonging to the range of A^ — al, and equation (25) would have a 
solution m. This m would be a generalized eigenvector of A’, contrary to the fact 
that there aren't any. 

Having determined à from equation (24), we determine h from equation (23), 
which we rearrange as 


(A — al)h = (à — A)h. (26) 


Appealing once more to Theorem 2' of Chapter 3 we note that (26) has a solution hif 
the right-hand side is annihilated by the nullspace of A’ — al. That nullspace 
consists of multiples of /, and equation (24) is precisely the requirement that it 
annihilate the right-hand side of (26). Note that equation (26) does not determine h 
uniquely, only up to a multiple of A. That is as it should be, since the eigenvectors 
h(t) are determined only up to a scalar factor that can be taken as an arbitrary 
differentiable function of r. 


3. MULTIPLE EIGENVALUES 


We are now ready to treat multiple eigenvalues. The occurence of generalized 
eigenvectors 1s hard to avoid for general matrices and even harder to analyze. For 
this reason we shall discuss only self-adjoint matrices, because they have no 
generalized eigenvectors. Even in the self-adjoint case we need additional 
assumptions to be able to conclude that the eigenvectors of A depend continuously 
on a parameter ¢ when A(7) is a differentiable function of r. Here is a simple 2 x 2 
example: 


b, c, d functions of t, so that c(0) = 0,b(0) = d(0) = I. That makes A(0) = I, 
which has | as double eigenvalue. 
The eigenvalues a of A are the roots of its characteristics polynomial. 


bid+ /(b— dy 4c 


a= 5 


Denote the eigenvector A as [2 The first component of the eigenvalue equation 
Ah = ah is bx + cy = ax, from which 


y a—b 


— 


X C 
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Using the abbreviation (d — b)/c — k, we can express 


y a-b k+vVk?+4 


X C 2 


We choose k(t) = sin(17!), c(t) = exp( —|r| !), and set b = 1,d = 1 + ck. Clearly 
the entries of A(t) are C™ functions, yet y/x is discontinuous as f — 0. 

Theorem 12 describes an additional condition under which the eigenvectors vary 
continuously, To arrive at these conditions we shall reverse the procedure employed 
for matrices with simple eigenvalues: we shall first compute the derivatives of 
eigenvalues and eigenvectors and prove afterwards that they are differentiable under 
the additional condition. 

Let AQ) be a differentiable function of the real variable 5, whose values are 
selfadjoint matrices. AX = A. Suppose that at ¢ = 0, A(0) has ao as eigenvalue of 
multiplicity k > 1, that is, a is a k-fold root of the characteristic equation of A(0). 
According to Theorem 11, the dimension of the generalized eigenspace of A(0) 
pertaining to the eigenvalue ap is k. Since A(0) is self-adjoint, it has no generalized 
eigenvectors; so the eigenvectors A(0)/1 = aoh form a k-dimensional space which 
we denote as N. 

We take now eigenvectors A(t) and eigenvalues alr) of A(t), a(0) = ao, presumed 
to depend differentiably on ¢. Then the derivatives of h and a satisfy equation (23); 
set t = 0: 


Ah + Ah = åh + ah. (27) 


We recall now from Chapter 8 the projection operators entering the spectral 
resolution; see equations (29), (30), (31), and (32). We denote by P the orthogonal 
projection onto the eigenspace N of A with eigenvalue à = ag. Since the 
eigenvectors of A are orthogonal, it follows [see equations (29)-(32)] that 

PA = qP. (28) 
Furthermore, eigenvectors / in N satisfy 

Ph — h. (28) 
Now apply P to both sides of (27): 
PAh + PAh = àPh + aPh. 


Using (28) and (28y, we get 


PAPA + aPh = ah + aPh. 
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The second terms on the right- and left-hand sides are equal, so after cancellation 
we get 


PAPA = ah. (29) 


Since A(x) is self-adjoint, so is A; and since P is self-adjoint, so is PAP. Clearly, PAP 
maps N into itself; equation (29) says that a(0) must be one of the eigenvalues of 
PAP on N, and /i(0) must be an eigenvector. 


Theorem 12. Let A(t) be a differentiable function of the real variable ¢ whose 
values are self-adjoint matrices. Suppose that at ¢ = 0, A(0) has an eigenvalue ag of 
multiplicity & > 1. Denote by N the eigenspace of A(0) with eigenvalue ag, and by P 
the orthogonal projection onto N. Assume that the self-adjoint mapping PA(0)P of N 


into N has & distinct eigenvalues d; i = l,...,K. Denote by w; corresponding 
normalized eigenvectors. Then for : small enough, A(f) has k eigenvalues 
aj(t),j = 1,...,k, near ag, with the following properties: 


(i) a;(t) depend differentiably on ¢ and tend to ag as t — 0. 
(ii) For ¢ Æ 0, the aj;(1) are distinct. 
(iii) The corresponding eigenvector /ij(t): 


A(t)h;(t) = a,(r)hj(t), (30) 
can be so normalized that A;(t) tends to w; as t — 0. 

Proof. For t small enough the characteristic polynomial of A(t) differs little from 
that of A(0). By hypothesis, the latter has a k-fold root at «o; it follows that the 
former have exactly k roots that approach aj as t — 0. These roots are the 
eigenvalues aj;(1) of A(r). According to Theorem 4 of Chapter 8, the corresponing 


eigenvectors A,;{f) can be chosen to form an orthonormal set. LJ 


Lemma 13. As: — 0, the distance of each of the normalized eigenvectors h;(t) 
from the eigenspace N tends to zero. 


Proof. Using the orthogonal projection P onto N, we can reformulate the 
conclusion as follows: 


lim ||(I—P)hj(2)||=0, — jh... (31) 
j= 


To show this, we use the fact that as ft — 0, A(t) — A(0) and aj(f) — ao: since 
\hj(t)\| = 1, we deduce from equation (30) that 


A(0)/(1) = agh;(t) T eft), (32) 
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where c(1) denotes a vector function that tends to zero as t — 0. Since N consists of 
eigenvectors of A(0), and P projects any vector onto N. 


A(0)PA;(t) = agPh;(t). (32) 
We subtract (32)' from (32) and get 
A(0)(I — P)hj(t) = ao(I — P)Aj(t) + eft). (33) 


Now suppose (31) were false; then there would be a positive number d and a 
sequence of t — 0 such that ||(I — P)A;(r)|| > d. We have shown in Chapter 7 that 
there is a subsequence of t for which (I — P)A;(t) tends to a limit h; this limit has 
norm >d. It follows from (33) that this limit satisfies 


A(0)h = agh. (33) 


This shows that A belongs to the eigenspace N. 

On the other hand, each of the vectors (I — P)/r(t) is orthogonal to N; therefore so is 
their limit A. But since N contains A, we have arrived at a contradiction. Therefore 
(31) is true. LI 


We proceed now to prove the continuity of A;(1) and the differentiability of a;(1). 
Subtract (32)' from (30) and divide by 5; after the usual Leibniz-ish rearrangement 
we get 


(r) + A(» 0 — PA(t) _ a(t) — a(0) h(t) +a(0) A(t) — Ph(t) | 


A(t) — A(0) h 
t i [ Í 


We have dropped the subscript j to avoid clutter. We apply P to both sides: according 
to relation (28) PA(0) = aP. Since P? = P we see that the second terms on the two 
sides are zero. So we get 


pA) = A(0) - EA dn c HUP T - D) a(t). (34) 


Since A was assumed to be differentiable, 


ci cM = A(0) + e(t): 


and by (31), A(t) = Ph(r) + e(t). Setting these into (34) we get, using P? — P, that 


PA(0)P Ph(r) = 


i A a, - AO) Ph(r) + e(t). (35) 
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By assumption, the self-adjoint mapping PA(O)P has k distinct eigenvalues d; on N, 
with corresponding eigenvectors wj; 


PÁ(0)Pw; = diw, ^ i—l,...,K. 
We expand P/i(r} in terms of these eigenvectors: 


PA(t =) xm, (36) 


where x; are functions of t, and set it into (35): 


225 (a - wee) wi = e(t). (35) 


Since the { w; } form an orthonormal basis for N, we can express the norm of the left- 
hand side of (36) in terms of components: 


IPAD? = M T pix". 


t) — h(t)|| tends to zero. Since ||/A(t) * = 1, we deduce that 
(r) 


IPR? = X [xa = 1— elt), (37) 


where e(z) denotes a scalar function that tends to zero. We deduce from (35)' that 


| alt) — a(Q) 
2s 4- £0 = ON ip (f = e(2). (37) 
Combining (37) and (37)' we deduce that for each t small enough there is an index j 


such that 


(i) LAM Sa, 


(i) lse) fori#j, 
(ii) ^ |x()| = 1- (0). 


d; — 


(38) 


Since x;(1) are continuous functions of t for t Æ 0, it follows from (38) that the index 
j is independent of t for ¢ small enough. 

The normalization ||h(1)|| = 1 of the eigenvectors still leaves open a factor of 
absolute value 1; we choose this factor so that not only |x;| but x; itself is near 1: 


x; = l — e(t). (38) 
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Now we can combine (31), (36), (38),;,. and (38) to conclude that 
A(t) — wj] S eft). (39) 


We recall now that the eigenvector A(t) itself was one of a set of & orthonormal 
eigenvectors. We claim that distinct eigenvectors A;(t) are assigned to distinct 
vectors w;; for, clearly two orthogonal unit vectors cannot both differ by less than € 
from the same vector wj. 

Inequality (39) shows that A;(t). properly normalized, tends to w; as t — 0. 
Inequality (38); shows that a;(t) is differentiable at t = 0 and that its derivative is dj. 
It follows that for ¢ small but not equal to 0, A(r) has simple eigenvalues near ao. This 
concludes the proof of Theorem 12. a 


4. ANALYTIC MATRIX-VALUED FUNCTIONS 


There are further results about differentiability of eigenvectors, the existence of 
higher derivatives, but since these are even more tedious than Theorem 12 we shall 
not pursue them, except for one observation, due to Rellich. Suppose A(f) 1s an 
analytic function of t: 


A(t) = 5 Ait’, (40) 
Ü 


where each A; is a selt-adjoint matrix. Then also the characteristic polynomial of 
A(t) is analytic in ¢. The characteristic equation 


p(s, t) 20 


defines s as a function of r. Near a value of t where the roots of p are simple, the roots 
a(t) are regular analytic functions of /; near a multiple root the roots have an 
algebraic singularity and can be expressed as power series in a fractional power of t: 


a(t) = 3 ntl, (40)' 


ü 


On the other hand, we know from Theorem 4 of Chapter 8 that for real t, the matrix 
A(t) is self-adjoint and therefore all its eigenvalues are real. Since fractional powers 
of t have complex values for real t, we can deduce that in (40)! only integer powers of 
t occur, that is, that the eigenvalues a(t) are regular analytic functions of t. 


5. AVOIDANCE OF CROSSING 


The discussion at the end of this chapter indicates that multiple eigenvalues of à 
matrix function A(r) have to be handled with care, even when the values of the 
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function are self-adjoint matrices. This brings up the question, How likely is it that 
A(t) will have multiple eigenvalues for some values of 1? The answer is, "Not very 
likely"; before making this precise, we describe a numerical experiment. 

Choose a value of n, and then pick at random two real, symmetric n x n matrices 
B and M. Define A(t) to be 


A(t) =B + tM. (41) 


Calculate numerically the eigenvalues of A(t) at a sufficiently dense set of values of 
t. The following behavior emerges: as ¢ approaches certain values of f, a pair of 
adjacent eigenvalues a, (1) and a»(1) appear to be on a collision course; yet at the last 
minute they turn aside: 


A 


t 


This phenomenon, called avoidance of crossing, was discovered by physicists in the 
early days of quantum mechanics. The explanation of avoidance of crossing was 
given by Wigner and von Neumann; it hinges on the size of the set of real, symmetric 
matrices which have multiple eigenvalues, called degenerate in the physics 
literature. 

The set of all real, symmetric n x n matrices forms a linear space of dimension 
N = n(n + 1)/2. There is another way of parametrizing these matrices, namely by 
their eigenvectors and eigenvalues. We recall from Chapter 8 that the eigenvalues are 
real, and in case they are distinct, the eigenvectors are orthogonal; we shall choose 
them to have length |. The first eigenvector, corresponding to the largest eigenvalue, 
depends on n — | parameters; the second one, constrained to be orthogonal to the 
first eigenvector, depends on n — 2 parameters, and so on, all the way to the {n — 1)st 
eigenvector that depends on one parameter. The last eigenvector is then determined, 
up to a factor plus or minus 1. The total number of these parameters is 
(n—1)--(n—2) ----1—n(n — 1)/2; to these we add the n eigenvalues, for 
a total of n(n — 1)/2 + n = n(n + 1)/2 = N parameters, as before. 

We turn now to the degenerate matrices, which have two equal eigenvalues, the 
rest distinct from it and each other. The first eigenvector, corresponding to the largest 
of the simple eigenvalues, depends on n — | parameters, the next one on nm — 2 
parameters, and so on, all the way down to the last simple eigenvector that depends 
on two parameters. The remaining eigenspace is then uniquely determined. The total 
number of these parameters is (n — 1) +---+2 = (n(n — 1))/2 — 1: to these we 
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add the n — ] distinct eigenvalues, for a total of (n(n — 1))/22)— 1--n—1— 
(n(n + 1))/2) - 2-2 N -2. 

This explains the avoidance of crossing: a line or curve lying in N-dimensional 
space will in general avoid intersecting a surface depending on N — 2 parameters. 


EXERCISE 8. (a) Show that the set of all complex, self-adjoint n x n matrices 
forms N = n^-dimensional linear space over the reals, 

(b) Show that the set of complex, self-adjoint n x mn matrices that have one double 
and n — 2 simple eigenvalues can be described in terms of N — 3 real parameters. 


EXERCISE 9. Choose in (41) at random two self-adjoint 10 x 10 matrices M and 
B. Using available software (MATLAB, MAPLE, etc.) calculate and graph at 
suitable intervals the 10 eigenvalues of B +M as functions of f over some 
I-segment, 


The graph of the eigenvalues of such a one-parameter family of 12 x 12 
self-adjoint matrices ornaments the cover of this volume; they were computed 
by David Muraki. 


CHAPTER 10 


Matrix Inequalities 


In this chapter we study self-adjoint mappings of a Euclidean space into itself that 
are positive. In Section ] we state and prove the basic properties of positive 
mappings and properties of the relation A < B. In Section 2 we derive some 
inequalities for the determinant of positive matrices. In Section 3 we study the 
dependence of the eigenvalues on the matrix in light of the partial order A < B. In 
Section 4 we show how to decompose arbitrary mappings of Euclidean space into 
itself as a product of self-adjoint and unitary maps. 


1. POSITIVITY 


We recall from Chapter 8 the definition of a positive mapping: 


Definition. A self-adjoint linear mapping H from a real or complex Euclidean 
space into itself is called positive if 


(x, Hx) > 0 for all x 4 0. (1) 
Positivity of H is denoted as H > O or O < H. 
We call a self-adjoint map K nonnegative if the associated quadratic form is 
(x, Kx) > 0 for all x. (2) 


Nonnegativity of K is denoted as K > O or O < K. 
The basic properties of positive maps are contained in the following theorem. 
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Theorem 1 


(i) The identity I is positive. 
(ii) If M and N are positive, so is their sum M + N, as well as aM for any 
positive number a. 
(iii) If H is positive and Q is invertible, then 


Q'HQ > O. (3) 


(iv) H is positive iff all its eigenvalues are positive. 
(v) Every positive mapping is invertible. 
(vi) Every positive mapping has a positive square root, uniquely determined. 
(vii) The set of all positive maps is an open subset of the space of all self- 
adjoint maps. 
(viii) The boundary points of the set of all positive maps are nonnegative maps 
that are not positive. 


Proof. Part (1) is a consequence of the positivity of the scalar product; part (11) is 
obvious. For part (iii) we write the quadratic form associated with Q' HQ as 


(x. Q'HQx) = (Qx, HQx) = (y, Hy), (3) 


where y = Qx. Since Q is invertible, if x Æ 0, y # 0, and so by (1) the right-hand side 
of (3)' is positive. 

To prove (iv), let / be an eigenvector of H, a the eigenvalue Hh = ah. Taking the 
scalar product with A we get 


(h, Hh) = alh, h): 


clearly, this is positive only if a > 0. This shows that the eigenvalues of a positive 
mapping are positive, 

To show the converse, we appeal to Theorem 4 of Chapter 8, according to which 
every self-adjoint mapping H has an orthonormal basis of eigenvectors. Denote these 
by h; and the corresponding eigenvalues by aj: 


Hh; — ajhj. (4) 


Any vector x can be expressed as a linear combination of the /i;: 


x= » xjhj. (4)' 


since the +; are eigenfunctions, 


Hx = ` xjajh;. (4) 
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Since the A; form an orthonormal basis, 
2 2 
(tx) = ) bo (x, Hx) = ) aj|x;|". (5) 


It follows from (5) that if all a; are positive, H is positive. 
We deduce from (5) the following sharpening of inequality (1): for a positive 
mapping H, 


^ / 
(x, Hx) > all x ||", for all x, (5) 

where a is the smallest eigenvalue of H. 
(¥) Every noninvertible map has a nullvector, which is an eigenvector with 
eigenvalue zero. Since by (iv) a positive H has all positive eigenvalues, H is invertible. 


(vi) We use the existence of an orthonormal basis formed by eigenvectors of H, H 
positive. With x expanded as in (4)', we define vH by 


V Hx = M xy ajh;, (6) 
where ,/d; denotes the positive square root of aj. Comparing this with the expansion 
(4)' of H itself we can verify that (/H)^ = H. Clearly, vH as defined by (6) has 
positive eigenvalues, and so by (iv) is positive. 

(vii) Let H be any positive mapping, and N any self-adjoint mapping whose 
distance from H is less than a, 


|N- HJ <a, 


where a is the smallest eigenvalue of H. We claim that N is invertible. Denote N — H 
by M; the assumption is that || M ||< a. This means that for all nonzero x in X, 


| Mx || < al x ||. 
By the Schwarz inequality, for x # 0, 
KG Mx)| < [Lx fH] Mx |] <a x ||" 
Using this and (5)', we see that for x 4 0, 
(x, Nx) = (x, (H + M)x) = (x, Hx) + (x, Mx) > a| x |]? — al] x |^ = 0. 
This shows that H + M = N is positive. 
(viii) By definition of boundary, every mapping K on the boundary is the limit of 
mappings H, > 0: 


lim H, — K. 


fi Ta 
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It follows from the Schwarz inequality that for every x, 


lim (x, H,x) = (x, Kx). 


n—9 


Since each H, is positive, and the limit of positive numbers is nonnegative, it follows 
that K > 0. K cannot be positive, for then by part (vii) it would not be on the 
boundary. LI 


EXERCISE T. How many square roots are there of a positive mapping? 


Characterizations analogous to parts of Theorem ] hold for nonnegative 
mappings: 


EXERICSE 2. Formulate and prove properties of nonnegative mappings similar 
to parts (1), (11), (111). (iv), and (vi) of Theorem 1. 


Based on the notion of positivity we can define a partial order among self-adjoint 
mappings of a given Euclidean space into itself. 


Definition. Let M and N be two selfadjoint mappings of a Euclidean space into 
itself. We say that M is less than N, denoted as 


M<N o N>M, (7) 
if N — M is positive: 
OcN-M. (Ty 


The relation M < N is defined analogously. 
The following properties are easy consequences of Theorem I. 


Additive Property. If M, < N; and M2 < No then 
M; + M» < Ni + No. (8) 


Transitive Property. lf L < M and M < N, then L < N. 
Multiplicative Property. W M < N and Q is invertible, then 


Q'MQ < Q'NQ. (9) 

The partial ordering defined in (7) and (7) for self-adjoint maps shares some— 

but not all—other properties of the natural ordering of real numbers. For instance, 
the reciprocal property holds. 


Theorem 2. Let M and N denote positive mappings that satisfy 


O « M « N. (10) 
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Then 
M-'>N7!, (10)' 

First Proof. We start with the case when N = I. By definition, M < | means that 
I—M is positive. According to part (iv) of Theorem 1, that means that the 
eigenvalues of I — M are positive, that is, that the eigenvalues of M are less than 1. 
Since M is positive, the eigenvalues of M lie between 0 and 1. The eigenvalues of 
M^! are reciprocals of those of M; therefore the eigenvalues of M^ are greater than 
l. That makes the eigenvalues of M^! — I positive; so by part (iv) of Theorem 1, 
M`! — Lis positive, which makes M^! > I. 

We turn now to any N satisfying (10); according to part (vi) of Theorem 1, we can 
factor N = R7, R > O. According to part (v) of Theorem 1, R is invertible: we use 
now property (9), with Q — R, to deduce from (10) that 


0OcR'MR'-c-R'NR^-T1ü 


From what we have already shown, it follows from the equation that the inverse of 
R^'MR'^! is greater than I: 


RMR >I. 
We use once more property (9), with Q = R^, to deduce that 
M`! > RIR’ = R4 = N~! o 
Second Proof. We shall use the following generally useful calculus lemma. 
Lemma 3. Let A(t} be a differentiable function of the real variable whose 
values are self-adjoint mappings; the derivative (d/dr)A is then also self-adjoint. 
Suppose that (d/dr)A is positive; then A(z) is an increasing function, that is, 


A(s) < Alt) when s « t. (11) 


Proof. Let x be any nonzero vector, independent of 7. Then by the assumption 
that the derivative of A Is positive, we obtain 


1 I 
(x Ax) = (x. ax) > 0. 


So by ordinary calculus, (x, A(1)x) is an increasing function of z: 
(x, A(s)x) < (x, A(t)x) for s « t. 


This implies that A(t) — A(s} > O, which is the meaning of (11). LI 
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Let A(t) be as in Lemma 3, and in addition suppose that A(r) is invertible; we 
claim that A! (1) is a decreasing function of ¢. To see this we differentiate A~}, using 
Theorem 2 of Chapter 9: 

d a-l = gy fÀ 


A 
dt dt 


We have assumed that dA /dt is positive, so it follows from part (iii) of Theorem 1 
that so is A^! (dA/dt)A . This shows that the derivative of A^! (t) is negative. It 


follows then from Lemma 3 that A ^! (1) is decreasing. 
We now define 


A()2M-I((N-M)  OczrczLl. (12) 


Clearly, dA/dt — N — M, positive by assumption (10). It further follows from 
assumption (10) that for 0 < r < J, 


A(t) = (1 — t)M +N 
is the sum of two positive operators and therefore itself positive. By part (v) of 
Theorem | we conclude that A(1) is invertible. We can assert now, as shown above, 
that A(r) is a decreasing function: 


AC > AS 11). 


Since A(0) = M, A(1) = N, this is inequality (10). This concludes the second proof 
of Theorem 2. L] 


The product of two self-adjoint mappings is not, in general, self-adjoint. We 
introduce the symmetrized product § of two self-adjoint mappings A and B as 


5 = AB + BA. (13) 

The quadratic form associated with the symmetrized product is 
(x, Sx) = (x, ABx) + (x, BAx) = (Ax, Bx) + (Bx, Ax). (14) 

In the real case 

(x, Sx) = 2(Ax, Bx). (14) 
This formula shows that the symmetrized product of two positive mappings need not 
be positive; the conditions (x. Ax) > O and (x, Bx) > 0 mean that the pairs of vectors 
x, Ax and x, Bx make an angle less than 7/2. But these restrictions do not prevent the 


vectors Ax, Bx from making an angle greater than 7/2, which would render (14) 
negative. 
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EXERCISE 3. Construct two real, positive 2 x 2 matrices whose symmetrized 
product is not positive. 


In view of the Exercise 3 the following result is somewhat surprising. 


Theorem 4. Let A and B denote two self-adjoint maps with the following 
properties: 


(i) A is positive. 

(ii) The symmetrized product S = AB + BA is positive. 
Then B ts positive. 

Proof. Define B(t) as B(r) = B + tA. We claim that for t > 0 the symmetrized 
product of A and B(f) is positive. For 


S(t) = AB(t) + B(t)A = AB + BA + 2tA? = S + 2tA*; 


: 7 "ME " š TE . 
since S and 2/A° are positive, their sum is positive. We further claim that for z large 
enough positive, B(r) is positive. For 


(x, B(t)x) = (x, Bx) + t(x, Ax): (15) 
A was assumed positive, so by (5)', 
(x, Ax) > all x |[^, a> 0, 
On the other hand, by the Schwarz inequality 
I(x, Bx)| < Ix MI Bx || < || B lll 0. 
Putting these inequalities together with (15), we get 
(x, B(t)x) > (ta — || B IDI] x I^: 


clearly this shows that B(f) is positive when fa > | B |. 

Since B(t) depends continuously on f, if B = B(O) were not positive, there 
would be some nonnegative value t; between 0 and || B ||/a, such that B(15) lies 
on the boundary of the set of positive mappings. According to part (viii) of 
Theorem |, a mapping on the boundary is nonnegative but not positive. Such a 
mapping B(fo) has nonnegative eigenvalues, at least one of which is zero. 


150 LINEAR ALGEBRA AND ITS APPLICATIONS 


So there is a nonzero vector y such that B(fg)y = 0. Setting x = y in (14) with 
B = Bíto9). we obtain 


(y. S(t9)y) = (Ay, B(to)y) + (B(to)y, Ay) = 0; 


this is contrary to the positivity of S(15): therefore B is positive. E 


In Section 4 we offer a second proof of Theorem 4. 
An interesting consequence of Theorem 4 is the following theorem. 


Theorem 5. Let M and N denote positive mappings that satisfy 


O<M<N; (16) 
then 

VM < VN, (16) 
where ,/ denotes the positive square root. 


Proof. Define the function A(t) as in (12): 


A(t) =M +t(N — M). 
We have shown that A(z) is positive when 0 < ¢ < 1; so we can define 
R()—VA()0, Ostsl, (17) 


where ,/ is the positive square root. It is not hard to show that R(r), the square root 
of a differentiable positive function, is differentiable. We square (17), obtaining 
R^-A; differentiating with respect to t, we obtain 


RR + RR = À, (18) 


where the dot denotes the derivative with respect to r. Recalling the definition (13) of 
symmetrized product we can paraphrase (18) as follows: The symmetrized product 
of R and is A. 

By hypothesis (16), À — N — M is positive: by construction, so is R. Therefore 
using Theorem 4 we conclude that R is positive on the interval [0, 1]. It follows then 
from Lemma 3 that R(f) is an increasing function of f: in particular, 


R(0) « R(1). 
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Since  R(0) = /A(0) — vM, R(D-—4/A(I) 2? VN, inequality — (16) 
L] 


follows. 

EXERCISE 4. Show that if 0 < M < N. then (a) M'/* < N'?, (b) M'" < Nl/m. 
m a power of 2. (c) log M < log N. 

Fractional powers and logarithm are defined by the functional calculus in 


Chapter 8. (Hint: log M = lim, .4. m|M'/" — Ij.) 


EXERCISE 5. Construct a pair of mappings 0 < M < N such that M" is not less 
than N?. (Hint: Use Exericse 3.) 


There is a common theme in Theorems 2 and 5 and Exercises 4 and 5 that can be 
expressed by the concept of monotone matrix function. 


Definition. A real-valued function f(s) defined for s > 0 is called a monotone 
matrix function if all pairs of self-adjoint mappings M, N satisfying 


O<M<N 


also satisfy 
f(M) < F(N), 


where /(M), f(N) are defined by the functional calculus of Chapter 8. 


According to Theorems 2 and 5, and Exercise 4, the functions f(s) = — 1/s, s!" 
log s are monotone matrix functions. Exercise 5 says f(s) = s? is not. 

Positive multiples, sums, and limits of monotone matrix functions are mmf's. 
Thus 


mj; > 0, t; > 0 


-5 m; 
8d [; 
are mmf's, as Is 


(19) 


where a is positive, b is real, and m(r) is a nonnegative measure for which the 
integral (19) converges. 

Carl Loewner has proved the following beautiful theorem. 

Theorem. Every monotone matrix function can be written in the form (19). 


152 LINEAR ALGEBRA AND ITS APPLICATIONS 


At first glance, this result seems useless, because how can one recognize that à 
function f(s) defined on R+ is of form (19)? There is, however, a surprisingly simple 
criterion: 

Every function f of form (19) can be extended as an analytic function in the upper 
half-plane, and has a positive imaginary part there. 


EXERCISE 6. Verify that (19) defines f(z) for a complex argument z as an analytic 
function, as well as that Im f(z) 0 for Im x > 0. 


Conversely, a classical theorem of Herglotz and F. Riesz says that every function 
analytic in the upper half-plane whose imaginary part is positive there, and which is 
real on the positive real axis, is of form (19). For a proof, consult the author's text 
entitled Functional Analysis. 

The functions —1/s, s'/",m > 1, log s have positive imaginary parts in the upper 
half-plane; the function s^ does not. 

Having talked so much about positive mappings, it is time to present some 
examples. Below we describe a method for constructing positive matrices, in fact all 
of them. 


Definition.  Letfi...., fm bean ordered set of vectors in a Euclidean space. The 
matrix G with entries. 


Gy = (fj. fi) (20) 
is called the Gram matrix of the set of vectors. 


Theorem 6. (i) Every Gram matrix is nonnegative. 
(ii) The Gram matrix of a set of linearly independent vectors is positive. 
(iii) Every positive matrix can be represented as a Gram matrix. 


Proof. The quadratic form associated with a Gram matrix can be expressed as 
follows: 


(x, Gx) = ` x;GjjX; = N (fif); 
ij 


- [Xu oui] =E (20) 


i 
Parts (i) and (ii) follow immediately from (20)'. To prove part (iii), let (Hj) = H 


be positive. Define for vectors x and y in C" the nonstandard scalar product (. ),, 
defined as 


(x, y)y = (x, Hy), 
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where (,) is the standard scalar product. The Gram matrix of the unit vectors f; — e; is 
(ei. €&j)y = (ei, He;) = hij. Li 


ExAMPLE. Take the Euclidean space to consist of real-valued functions on the 
interval [O, 1], with the scalar product 


(f,g) = fear. 


Choose f; = f= !, j= 1,...,n. The associated Gram matrix is 
G; = (21) 
7 i+j-1 
EXERCISE 7. Given m positive numbers rj,...,7,, show that the matrix 
Gy —— (22) 
Yr tritl g 
is positive. 
Example. Take as scalar product 
2m 
Fe) =| rtogto)winae 
0 
where w is some given positive real function. Choose f; = e/", j = —n,...,n. The 


associated (2n + 1) x (2n + 1) Gram matrix is Gy; = cy_;, where 


Cy = | w(8)e "^q. 


We conclude this section with a curious result due to I. Schur. 


Theorem 7. Let A = (Aj) and B = (B;) denote positive matrices. Then 
M = (M;jj). whose entries are the products of the entries of A and B, 


Mi; — A;B; (23) 
also is a positive matrix. 


In Appendix 4 we shall give a one-line proof of Theorem 7 using tensor products. 
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2. THE DETERMINANT OF POSITIVE MATRICES 
Theorem 8. The determinant of every positive matrix is positive. 


Proof. According to Theorem 3 of Chapter 6, the determinant of a matrix is the 
product of its eigenvalues. According to Theorem 1 of this chapter, the eigenvalues 
of a positive matrix are positive. Then so is their product. E 


Theorem 9. Let A and B denote real, self-adjoint, positive n x n matrices. 
Then for all ¢ between 0 and 1, 


det(tA + (1 —1)B) > (det A)' (det B)'*. (24) 


Proof. Take the algorithm of both sides. Since log is a monotonic function, we 
get the equivalent inequality: for all z in [0, 1], 


i 


log det(fA + (1 — 4B) > tlogdet A + (1 — t) log det B. (24) 


We recall the concept of a concave function of a single variable: A function f(x) 
is called concave if its graph between two points lies above the chord connecting 
those points. Analytically, this means that for all ¢ in [O, 1], 


f(ta+ (1 —t)b) > tf(a) + (1— fib). 


Clearly, (24y can be interpreted as asserting that the function log det H is concave on 
the set of positive matrices. Note that it follows from Theorem | that for A and B 
positive, tA + (1 — r)B is positive when 0 < ¢ € 1, According to a criterion we learn 
in calculus, a function whose second derivative is negative is concave. For example, 
the function log t, defined for t positive, has second derivative —1/1, and so it is 
concave. To prove (24), we shall calculate the second derivative of the function 
f(t) = log det(tA + (1 — t)B) and verify that it is negative. We use formula (10) of 
Theorem 4 in Chapter 9, valid for matrix valued functions Y(z) that are 
differentiable and invertible: 


d 
— log det Y = tr(Y ^! Y). (25) 
dt 
In our case, Y(t) = B + t(A — B): its derivative is Y = A — B, independent of r. So, 
differentiating (25) with respect to 7, we get 
d" -6Ixry-ls -dIxn2 ; 
72 log det Y =tr(-Y ‘YY Y)-2-tu(Y Yy. (25) 


Here we have used the product rule, and rules (2) and (3) from Chapter 9 
concerning the differentiation of the trace and the reciprocal of matrix functions. 
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According to Theorem 3 of Chapter 6, the trace of a matrix is the sum of its 


eigenvalues; and according to Theorem 4 of Chapter 6, the eigenvalues of the square 
of a matrix T are the square of the eigenvalues of T. Therefore 


t(Y Y) = J aj, (26) 
where a; are the eigenvalues of dg d According to Theorem 11' in Chapter 8, the 
eigenvalues a; of the product Y ^ ! Y of a positive matrix Y~! and a self-adjoint matrix 
Y are real. It follows that (26) is positive; setting this into (25)', we conclude that the 
second derivative of log det Y(t) is negative. [] 

Second Proof. Define C as B^! A; by Theorem 11’ of Chapter 8, the product C of 


two positive matrices has positive eigenvalues c;. Now rewrite the left-hand side of 
(24) as 


det B(zB ^! A + (1 — 2)I) = det B det(1C + (1— NI) 
Divide both sides of (24) by det B; the resulting right-hand side can be rewritten as 
(det A) (det B) ' = (det C)'. 
What is to be shown is that 
det(rC + (1 — AI) > (det C)’. 
Expressing the determinants as the product of eigenvalues gives 
| [Ge +1-r)> [|< 


We claim that for all ¢ between O and | each factor on the left is greater than the 
corresponding factor on the right: 


fc - (1— t) 2 c. 


This is true because c' is a convex function of ¢ and equality holds when t = O or 


X. L 
Next we give a useful estimate for the determinant of a positive matrix. 


Theorem 10. The determinant of a positive matrix H does not exceed the 
product of its diagonal elements: 


detH < | | hi: (27) 
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Proof. Since H is positive, so are its diagonal entries. Define d; = 1/\//;;, and 
denote by D the diagonal matrix with diagonal entries d;. Define the matrix B by 


B — DHD. 


Clearly, B is symmetric and positive and its diagonal entries are all l's. By the 
multiplicative property of determinants, 


> detH 
det B = det H det D? = C. (28) 


i li hj; 


So (27) is the same as det B < I. To show this, denote the eigenvalues of B by 
b,....,b,, positive quantities since B is a positive matrix. By the arithmetic- 
geometric mean inequality 


II» (5^ bi/n) . 


We can rewrite this as 
tr B n 
det B < e (29) 
n 


Since the diagonal entries of B are all l's, tr B = n, so det B < 1 follows. [] 
Theorem 10 has this consequence. 
Theorem 11. Let T be any n x n matrix whose columns are c;,c2,...,¢,. Then 


the determinant of T is in absolute value not greater than the product of the length of 
its columns: 


det T| < [ [ Il o; |l. (30) 


Proof. Define H = TT; its diagonal elements are 
* = 2 : 2 
hy = `o bili = 1 [jjj = 3 Ital” = || e; I|- 
j j j 


According to Theorem 1, T T is positive, except when T is noninvertible, in which 
case det T — 0, so there is nothing to prove. We appeal now to Theorem 10 and 
deduce that 


detH < |] || o Il". 
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Since the determinant is multiplicative, and since det T' — det T, 
det H = det T*det T = |det T|". 


Combining the last two and taking its square root we obtain inequality (30) of 
Theorem 11. L 


Inequality (30) is due to Hadamard and is useful in applications. In the real case it 
has an obvious geometrical meaning: among all parallelepipeds with given side 
lengths || c; ||, the one with the largest volume is rectangular. 

We return to Theorem 9 about determinants; the first proof we gave for it used the 
differential calculus. We present now a proof based on integral calculus. This proof 
works for real, symmetric matrices; it is based on an integral formula for the 
determinant of real positive matrices. 


Theorem 12. Let H be an n x n real, symmetric, positive matrix. Then 


n?2 


it 


———2| atd. 31 
v det H I. pu 


Proof. It follows from inequality (5) that the integral (31) converges. To evaluate 
it, we appeal to the spectral theorem for self-adjoint mappings. see Theorem 4' of 
Chapter 8, and introduce new coordinates 


x = My, (32) 


M an orthogonal matrix so chosen that the quadratic form is diagonalized: 


2 


(x, Hx) = (My, HMy) = (y, MHMy) = V ajy 


(33) 


The a; are the eigenvalues of H. We substitute (33) into (31); since the matrix M is an 
isometry, it preserves volume as well: |det M| = 1. In terms of the new variables the 
integrand is a product of functions of single variables, so we can rewrite the right 
side of (31) as a product of one-dimensional integrals: 


Je Eoy; dy = | lI e "dy = I Je? i dy;. (34) 


The change of variable \/ay = z turns each of the integrals on the right in (34) into 
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According to a result of calculus 
| edz = Vm, (35) 


so that the right-hand side of (34) equals 


1/4 n/2 
g" m" 


va | (IIa) 


(34) 


According to formula (15), Theorem 3 in Chapter 6 the determinant of H is the 
product of its eigenvalues; so formula (31) of Theorem 12 follows from (34) and 


(34)'. L 
EXERCISE 8. Look up a proof of the calculus result (35). 


Proof of Theorem 9. We take in formula (35), H = tA + (1 — t)B, where A, B 
are arbitrary real, positive matrices: 


T /2 


— de | e SAT 1)B)x) dy 
det(*A+(1—17)B) Jp 


g M Fa 1—2)(x,Bx) ty. (36) 
pR" 


We appeal now to Hólder's inequality: 


| /p lie 
| fea < (| ras) ‘(| ed) Jj 


where p. q are real, positive numbers such that 


We take 


f(x) = ed g(x) - eg 0-7 00Bx) 


and choose p = l/t, q = 1/(1 — t); we deduce that the integral on the right in (36) is 


not greater than 
| f | l-t 
( | 4% ds ( D) 
Jg 
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Using formula (31) to express these integrals we get 


( siti 2 ) ( gt 2 ) g"? 
v det A v det B / (det A)' (det B)! ~" 
Since this is an upper bound for (36), inequality (24) follows. E 


Formula (31) also can be used to give another proof of Theorem 10. 


Proof. In the integral on the right in (31) we write the vector variable x as 
X = ue; +z, Where u is the first component of x and z the rest of them. Then 


(x, Hx) = hu + 2ul(z) + (z, Hyiz), 


where /(z) is some linear function of z. Setting this into (31) gives 


xl? 
v det H 


Changing the variable u to —u transforms the above integral into 


= | | etre tela a (37) 


| | eine, ie 
Adding and dividing by 2 gives 


1 =l 
| jest 22) EE < du dz, (37) 


. ^ ' " 8.4 
where c abbreviates e^". Since c is positive, 
-] 
CTE 
2 
2 


Therefore (37)' is bounded from below by 


| | eem, dz. 


The integrand is now the product of a function of u and of z, and so is the product of 
two integrals, both of which can be evaluated by (31): 


1/2 —(n—1)/2 


T/^ n 


Vi, det Hj 


160 LINEAR ALGEBRA AND ITS APPLICATIONS 
Since this is a lower bound for the right-hand side of (37), we obtain that 
det H < /jdet Hj,. Inequality (27) follows by induction on the size of H. O 


3. EIGENVALUES 


In this section we present a number of interesting and useful results on 
eigenvalues. 


Lemma 13. Let A be a self-adjoint map of a Euclidean space U into itself. We 
denote by p. (A) the number of positive eigenvalues of A, and denote by p. (A) the 
number of its negative eigenvalues. 

p+(A) = maximum dimension of subspace S of U such that (Aw, #) is positive 
on S. 


p—(A) = maximum dimension of subspace S of U such (Au, u) is negative on S. 


Proof. This follows from the minmax characterization of the eigenvalues of A; 
see Theorem 10, as well as Lemma 2 of Chapter 8. O 


Theorem 14. Let U and A be as in Lemma 13, and let V be a subspace of U 
whose dimension is one less than the dimension of U: 


dim V = dim U — I. 


Denote by P orthogonal projection onto V. Then PAP is a self-adjoint map of U into 
U that maps V into V; we denote by B the restriction of PAP to V. We claim that 


pi(A) - 1 < p. (B) S p+(A), (38), 
and 


p-(A) — 1 < p-(B) < p-(A). (38) 


Proof. Let T denote a subspace of V of dimension p, (B) on which B is positive: 


(Bv,v) »0, vin T, y #0. 


By definition of B, we can write this as 


0 < (PAPv, v) = (APv, Py). 
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Since v belongs to T, a subspace of V, Pv = v. So we conclude that A is positive on T; 
this proves that 


p«(B) < ps{A). 


To estimate p4 (B) from below, we choose a subspace S of U, of dimension p, (A) 
on which A is positive: 


(Au, u) > 0, u in S, u x Q. 
Denote the intersection of S and V by T: 
T2s5nV. 
We claim that the dimension of T is at most one less than the dimension of S: 
dim S — 1 < dim T. 


If S is a subspace of V, then T = $ and dim T = dim S. If not, choose a basis in 
S: {5 ,...,5,}. At least one of these, say 5;, does not belong to V; this means that 5; 
has a nonzero component orthogonal to V. Then we can choose scalars ao... . . ax 
such that 


S3 — d25],..., 5k — ARS] 


belong to V. They are linearly independent, since $}, .. . , 5, are linearly independent. 
It follows that 


dim S — 1 < dim T. 


as asserted. 
We claim that B is positive on T. Take any v Æ 0 in T: 


(Bv, v) = (PAPv, v) = (APv, Pv) = (Av, v), 
since v belongs to V. Since v also belongs to S, (Av, v) > 0. 
Since p. (B) is defined as the dimension of the largest subspace on which B is 
positive, and since dim T > dim S — I, p. (B) > p, (A) — 1 follows. This completes 
the proof of (38),: (38) can be proved similarly. LI 


An immediate consequence of Theorem 14 is the following theorem. 


Theorem 15. Let U, V, A, and B be as in Theorem 14. Denote the eigenvalues 
Of A as dj,...,a,, and denote those of B as b,,.... b, ,. The eigenvalues of B 
separate the eigenvalues of A: 


a, € bj € a» € :-: b, X dg. (39) 
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Proof. Apply Theorem 14 to A — c and B — c. We conclude that the number of b; 
less than c is not greater than the number of a; less than c, and at most one less. We 
claim that a; < b;; if not, we could choose b; < c < a; and obtain a contradiction. 
We can show analogously that b; < a;,;. This proves that a; € b; < aj,;, as asserted 


in (39). [] 


Take U to be R” with the standard Euclidean structure, and take A to be any n x n 
self-adjoint matrix. Fix i to be some natural number between 1 and n, and take V to 
consist of all vectors whose ith component is zero. Theorem 14 says that the 
eigenvalues of the ith principal minor of A separate the eigenvalues of A. 


EXERCISE 9, Extend Theorem 14 to the case when dim V = dim U — m, where 
m is greater than 1. 


The following result is of fundamental interest in mathematical physics; see, for 
example, Theorem 4 of Chapter 11. 


Theorem 16. Let M and N denote self-adjoint k x k matrices satisfying 


M <N. (40) 


Denote the eigenvalues of M, arranged in increasing order, by my < --- < my, and 
those of N by i < --- < mj. We claim that 


HUE. J= l-k. (41) 


First Proof. We appeal to the minmax principle, Theorem 10 in Chapter 8, 
formula (40), according to which 


: (x. Mx) 
pe ax e 42 
ie. dim] n $ (xx) — (92) n 
(x, Nx) 


Hn; — min max 


42 
dim S =} xin 5 (x, x) ( Ji 


Denote by T the subspace of dimension j for which the minimum in (42), is reached, 
and denote by y the vector in T where (x, Mx) /(x,x) achieves its maximum: we take 
y to be normalized as || v || — 1. Then by (42) 


m’ 
m; < (y, My), 


while from (42),. 


(y, Ny) € nj. 


MATRIX INEQUALITIES 163 


Since the meaning of (40) is that (v, My) < (v, Nv) for all v0, (41) 
follows. LJ 


If the hypothesis (40) is weakened to M < N, the weakened conclusion m; < nj 
can be reached by the same argument. 


Second Proof. We connect M and N by a straight line: 
A(t) =M +(N — M); (43) 


we also use calculus, as we have done so profitably in Section |. Assuming for a 
moment that the eigenvalues of A(7) are distinct, we use Theorem 7 of Chapter 9 to 
conclude that the eigenvalues of A(t) depend differentiably on 7, and we use formula 
(24) of that chapter for the value of the derivative. Since A is self-adjoint, we can 
identify in this formula the eigenvector / of A' with the eigenvector h of A itself. 


Normalizing /t so that || h || = 1, we have the following version of (24), Chapter 9, 
for the derivative of the eigenvalue a in Ah = ah: 
da dA / 
——|h,—hJ. 43 
dt ( i di ) vel 


For A(t) in (43), dA/dt = N — Mis positive according to hypothesis (41); therefore 
the right-hand side of (43) is positive. This proves that da/dr is positive, and 
therefore a(t) is an increasing function of f: in particular, a(0) < a(1). Since 
A(0) = M, A(1) = N, this proves (41) in case A(/) has distinct eigenvalues for all ¢ 
in [0, 1]. 

In case A(t) has multiple eigenvalues for a finite set of z, the above argument 
shows that each a;(f) is increasing between two such values of t; that is enough to 
draw the conclusion (41). Or we can make use of the observation made at the end of 
Chapter 9 that the degenerate matrices form a variety of codimension 2 and can be 
avoided by changing M by a small amount and passing to the limit. O 


The following result is very useful. 


Theorem 17. Let M and N be self-adjoint k x k matrices mm; and n; their 
eigenvalues arrayed in increasing order. Then 


inj - m| €|| M—N || (44) 


Proof. Denote | 


M — N || by d. It is easy to see that 
N — dl € M € N ^ dI. (44)' 


Inequality (44) follows from (44)' and (41). LJ 
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EXERCISE IO. Prove inequality (44)'. 
Wielandt and Hoffman have proved the following interesting result. 


Theorem 18. Let M, N be self-adjoint k x k matrices and m; and nm; their 
eigenvalues arranged in increasing order. Then 


X (n-m)X|N-M |}, (45) 


where || N — M ||; is the Hilbert-Schmidt norm defined by 


| CI = Y "lol. (46) 


Proof. The Hilbert-Schmidt norm of any matrix can be expressed as a trace: 


ICE ucc. (46) 
For C self-adjoint, 
| CIs = C". (46)" 
Using (46)" we can rewrite inequality (45) as 
» (nj — my)” € w(N - MJ. 
Expanding both sides and using the linearity and commutativity of trace gives 


» on — 2njm; + m; € trN? — 2tr(NM) + tr MŽ. (47) 


J 


According to Theorem 3 of Chapter 6, the trace of N? is the sum of the eigenvalues 
of N*. According to the spectral mapping theorem, the eigenvalues of N? are n. 


Therefore 
Son? = tr N°, >) m = tr M*; 
so inequality (47) can be restated as 
b» nimi > tr(NM). (47y 


To prove this we fix M and consider all self-adjoint matrices N whose eigenvalues 
are m,..., nj. The set of such matrices N forms a bounded set in the space of all 
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self-adjoint matrices. By compactness, there is among these that matrix N that 

renders the right-hand side of (47)' largest. According to calculus, the maximizing 

matrix Nmax has the following property: if N(ż) is a differentiable function whose 

values are self-adjoint matrices with eigenvalues 7j,....75, and N(0) = Nmax, then 
— tr(N(t)M) = 0. (48) 
di di-0 

Let A denote any anti-self-adjoint matrix; according to Theorem 5, part (e), 

Chapter 9, e^' is unitary for any real values of t. Now define 


N(r) = e"Nge ^. (49) 


Clearly, N(f) is self-adjoint and has the same eigenvalues as Nmax- According to part 
(d) of Theorem 5, Chapter 9, 


d 
— eM —- Ae™ -— eM A. 
dt 


Using the rules of differentiation developed in Chapter 9, we get, upon 
differentiating (49), that 


—N(t) = e™ (AN max — Nm AeA. 


= tr( ANmaxM = Nmax AM) = 0. 


Using the commutativity of trace, we can rewrite this as 
tr( A(NmaxM E MN max }) = 0. (48)' 


The commutator of two self-adjoint matrices Nmax and M is anti-self-adjoint, so we 
may choose 


A= Nmax M = MN max- (50) 
Setting this into (48y reveals that tr A^ = 0: since by (46), for anti-self-adjoint A, 


tr A^ = — >. laj|. 


we deduce that A = 0, so according to (50) the matrices Nmax and M commute. 
such matrices can be diagonalized simultaneously; the diagonal entries are n; and 
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M; in some order. The trace of N44,M can therefore be computed in this 


representation as 
) ny Mj, (51) 


where pj, j = 1,...,4 is some permutation of 1,...,k. It is not hard to show, and is 
left as an exercise to the reader, that the sum (51) is largest when the n; are arranged 
in the same order as the ij, that is, increasingly. This proves inequality (47) for Nmax 
and hence for all N. O 


EXERCISE I1. Show that (51) is largest when n; and m; are arranged in the same 
order. 


The next result is useful in many problems of physics. 


Theorem 19. Denote by Emin( H) the smallest eigenvalue of a self-adjoint 
mapping H in a Euclidean space. We claim that egi, is a concave function of H, that 
is, that for 0 € t < 1, 


€min( fL + (1 — t)M) 2 témin(L) + (1 — t)emin(M) (52) 


for any pair of self-adjoint maps L and M. Similarly. e,,,,(H) is a convex function of 
H; for 0 « t « I, 


Cmax (fL. + (1 m HM) < temar(L) T (1 t )émax( M). (52) 


Proof. We have shown in Chapter 8, equation (37), that the smallest eigenvalue 
of a mapping can be characterized as a minimum: 


€nin( Hh) = pun (x. Hx). (53) 


Let y be a unit vector where (x, Hx), with H = tL + (1 — £)M reaches its minimum. 
Then 
min (IL T (1 oa 1)M) = t(y, Ly) + (1 7 ty. My) 


f mm (x, Lx) + (1 — 1) min et, Mx) 
x|| = Ix|| = 


= témin{ L) T (1 = t)emin( M). 


IN 


This proves (52). Since —émax(A) = @minf—A), the convexity of e@max(A) 
follows. O 


Note that the main thrust of the argument above is that any function characterized 
as the minimum of a collection of linear functions is concave. 
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4. REPRESENTATION OF ARBITRARY MAPPINGS 
Every linear mapping Z of a complex Euclidean space into itself can be 


decomposed, uniquely, as a sum of a self-adjoint mapping and an anti-self-adjoint 
one: 


Z=H+A, (54) 
where 
H* = H, A* = —A. (54) 


Clearly, if (54) and (54)' hold, Z* = H* + A' = H — A, so H and A are given by 


Z4 Z Z-Z 


H 


H is called the self-adjoint part of Z, A the anti-self-adjoint part. 
Theorem 20. Suppose the self-adjoint part Z is positive: 
Z+ Z > 0. 
Then the eigenvalues of Z have positive real part. 


Proof. Using the conjugate symmetry of scalar product in a complex Euclidean 
space, and the definition of adjoint, we have the following identity for any vector h: 


? Re(Zh, h) = (Zh, h) + (Zh, h) = (Zh, h) + (h, Zh) = (Zh, h) + (Z^ h, h) 
= ((Z + Z*)h, h). 


Since we assumed in Theorem 18 that Z + Z^ is positive, we conclude that for any 
vector h Æ 0, (Zh. h) has positive real part. 

Let h be an eigenvector for Z of norm || ^ ||= 1, with z the corresponding 
eigenvalue, Zh = zh. Then (Zh, h) = z has positive real part. LI 


In Appendix 14 we give a far-reaching extension of Theorem 20. 

Theorem 20 can be used to give another proof of Theorem 4 about symmetrized 
products: Let A and B be self-adjoint maps, and assume that A and AB+ BA — S 
are positive. We claim that then B 1s positive. 


Second Proof of Theorem 4. Since A is positive, it has according to Theorem | a 
square root A!’? that is invertible. We multiply the relation 


AB+BA=S 
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by A^! from the right and the left: 
AI? gA-U? + ABA? = ASA". (55) 


We introduce the abbreviation 


A?gA-U? =Z (56) 


and rewrite (55) as 


Z+Z° -A'"sA 7 (55) 


Since S is positive, so, according to Theorem 1, is A 254 1/2. it follows from (55)' 
that Z + Z' is positive. By Theorem 20 the eigenvalues of Z have positive real part. 

Formula (56) shows that Z and B are similar; therefore they have the same 
eigenvalues. Since B is self-adjoint, it has real eigenvalues; so we conclude that the 
eigenvalues of B are positive. This, according to Theorem 1, guarantees that B is 
positive. E 


EXERCISE 12. Prove that if the self-adjoint part of Z is positive, then Z is 
invertible, and the self-adjoint part of Z ' is positive. 


The decomposition of an arbitrary Z as a sum of its self-adjoint and anti-self-adjoint 
parts is analogous to writing a complex number as the sum of its real and imaginary 
parts, and the norm is analogous to the absolute value. The next result strengthens this 
analogv. Let a denote any complex number with positive real part; then 
ELS 


— = W 


a + az 


= 


maps the right half-plane Re z > 0 onto the unit disc |w| < 1. Analogously, we claim 
the following: 


Theorem 21. Let a be a complex number with Re a > 0. Let Z be a mapping 
whose self-adjoint part Z + Z' is positive. Then 


W = (I—aZ)(1+4Z) (57) 
is a mapping of norm less than 1. Conversely, || W || < 1 implies that Z + Z' > 0. 


Proof. According to Theorem 20 the eigenvalues z of Z have positive real part. It 
follows that the eigenvalues of I + aZ. 1 + aZ are # 0: therefore I + aZ is invertible. 
For any vector x, denote (I 4- aZ) x — y; then by (57), 


(I — aZ)y = Wx, 
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and by definition of v, 


(I -- aZ)y = x. 


The condition || W || < 1 means that || Wx ||? < || x ||? for all x 4 0; in terms of v 
this can be expressed as 


|| y — aZy ||? < l| y + aZy ||? (58) 
Expanding both sides gives 


^ 3 2 
y [^ + Jal ll Zy || 
+ a(Zy,y)+a(y,Zy). (59) 


Iy I? + lafl] Zy I? —a (Zy,y) — aly, Zy) < | 


Cancelling identical terms and rearranging gives 
0 « (a 4- a)[(Zy, y) + (y, Zy)] = 2RealZ + Z*]y, y). (60) 


Since we have assumed that Re a is positive and that Z + Z' > 0, (60) is true. 
Conversely, if (60) holds for all v, Z + Z' is positive. Lj 


Complex numbers z have not only additive but multiplicative decompositions: 
z= re", r > 0, |e?^| = 1. Mappings of Euclidean spaces have similar decomposi- 


tions. 


Theorem 22. Let A be a linear mapping of a complex Euclidean space into 
itself. Then A can be factored as 


A — RU, (61) 


where R is a nonnegative self-adjoint mapping, and U is unitary. When A is 
invertible, R is positive, 


Proof. Take first the case that A is invertible; then so is A . For any x z 0, 
(AA*x,x) = (A*x, A*x) = || At |^ > 0. 


This proves that AA’ is a positive mapping. According to Theorem 1, AA” has a 
unique positive square root R: 


AA* = R°. (62) 
Define U as R^! A; then U* = A*R™!, and so by (62), 


UU* = R'AA*R = R O'R R! = I 
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It follows that U is unitary. By definition of U as R^! A, 
A = RU, 


as asserted in (61). 
When A is not invertible, AA’ is a nonnegative self-adjoint map; it has a uniquely 
determined nonnegative square root R. Therefore 


| Rx ||? = (Rx, Rx) = (R?x, x) = (AA*x, x) 
= (A*x, A*x) = || A*x |I? (63) 


Suppose Rx=Ry; then || R(x—y)| =0, and so according to (63), 
| A" (x — y) || = 0, therefore A; = A*y. This shows that for any w in the range of 
R, u = Rx, we can define Vuas A" x. According to (63), V is an isometry; therefore it 
can be extended to the whole space as a unitary mapping. 

By definition, A* = VR; taking its adjoint gives A = RV”, which is relation (61) 
with V* — U. L 


According to the spectral representation theorem, the self-adjoint map R can be 
expressed as R = WDW*, where D is diagonal and W is unitary. Setting this into 
(61) gives A = WDW*U. Denoting W U as V, we get 


A = WDV, (64) 
where W and V are unitary and D is diagonal, with nonnegative entries. Equation 
(64) is called the singular value decomposition of the mapping A. The diagonal 
entries of D are called the singular values of A; they are the nonnegative square roots 
of the eigenvalues of AA’. 

Take the adjoint of both sides of (61); we get 
A* = U'R. (61) 
Denote A’ as B, denote U” as V, and restate (61) as 
Theorem 22'. Every linear mapping B of a complex Euclidean space can be 
factored as 
B = M5, 
where 5 is self-adjoint and nonnegative, and M is unitary. 


Note. When B maps a real Euclidean space into itself, so do S and M. 


EXERCISE 13. Let A be any mapping of a Euclidean space into itself. Show that 
AA and A'A have the same eigenvalues with the same multiplicity. 
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EXERCISE 14. Let A be a mapping of a Euclidean space into another Euclidean 
space. Show that AA’ and A*A have the same nonzero eigenvalues with the same 
multiplicity. 


EXERCISE 15. Give an example of a 2 x 2 matrix Z whose eigenvalues have 
positive real part but Z + Z^ is not positive. 


EXERCISE 16. Verify that the commutator (50) of two self-adjoint matrices is 
anti-self-adjoint. 


CHAPTER 11 


Kinematics and Dynamics 


In this chapter we shall illustrate how extremely useful the theory of linear algebra in 
general and matrices in particular are for describing motion in space. There are three 
sections, on the kinematics of rigid body motions, on the kinematics of fluid flow, 
and on the dynamics of small vibrations. 


1. THE MOTION OF RIGID BODIES 


An isometry was defined in Chapter 7 as a mapping of a Euclidean space into itself 
that preserves distances. When the isometry relates the positions of a mechanical 
system in three-dimensional real space at two different times, it is called a rigid body 
motion. In this section we shall study such motions. 

Theorem 10 of Chapter 7 shows that an isometry M that preserves the origin is 
linear and satisfies 


MM =L. (1) 


As noted in equation (33) of that chapter, the determinant of such an isometry is plus 
or minus l; its value for all rigid body motions is 1. 


Theorem 1 (Euler). An isometry M of three-dimensional real Euclidean space 
with determinant plus | that is nontrivial, that is not equal to I, is a rotation; it has a 
uniquely defined axis of rotation and angle of rotation 8. 


Proof. Points f on the axis of rotation remain fixed, so thev satisfy 


Mf =f; (2) 
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that is, they are eigenvectors of M with eigenvalue 1. We claim that a nontrivial 
isometry, det M = 1, has exactly one eigenvalue equal to |. To see this, look at the 
characteristic polynomial of M, p(s) = det(sI — M). Since M is a real matrix, p(s) 
has real coefficients. The leading term in p(s) is s^. so p(s) tends to +00 as s tends 
to +o0. On the other hand, p(0) = det( —M) = — det M = —1. So p has a root on 
the positive axis; that root is an eigenvalue of M. Since M is an isometry, that 
eigenvalue can only be plus |. Furthermore, | is a simple eigenvalue: for if a second 
eigenvalue were equal to 1, then, since the product of all three eigenvalues equals 
det M = 1, the third eigenvalue of M would also be |. Since M is a normal matrix, it 
has a full set of eigenvectors, all with eigenvalue 1; that would make M = I, 
excluded as the trivial case. 

To see that M is a rotation around the axis formed by the fixed vectors, we 
represent M in an orthonormal basis consisting of f satisfying (2), and two other 
vectors. In this basis the column vector (1.0.0) is an eigenvector of M with 
eigenvalue 1; so the first column is (1,0,0). Since the columns of an isometry are 
orthogonal unit vectors and M — I, the matrix M has the form 


|l O0 O0 
M=1[0 c =s |, (3) 
0s cœ 


where c^ + s* = 1. Thus c = cos, s = sin, 0 some angle. Clearly, (3) is rotation 
around the first axis by angle 8. E 


The rotation angle is easily calculated without introducing a new basis that brings 
M into form (3). We recall the definition of trace from Chapter 6 and Theorem 2 in 
that chapter, according to which similar matrices have the same trace. Therefore, M 
has the same trace in every basis; from (3), 


tr M = 1+ 2cos 6, (4) 
hence 
trM — | 
cosô = —7——. (4)’ 


We turn now to rigid motions which keep the origin fixed and which depend on 
time 7, that is, functions Mír) whose values are rotations. We take Mí?) to be the 
rotation that brings the configuration at time O into the configuration at time 7. Thus 


M(0) = I. (5) 


If we change the reference time from 0 to /;. the function M, describing the motion 
from /, to / 1s 


Mi(r) = M(M(n) !. (6) 
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Equation (1) shows that M” is left inverse of M; then it is also right inverse: 
MM'-I (7) 


We assume that M(1) is a differentiable function of r. Differentiating this with respect 
to ¢ and denoting the derivative by the subscript ¢ gives 


M;M' + MM, = 0. (8) 
We denote 
M;M' = A. (9) 
Since differentiation and taking the adjoint commute, 
A" = MM;; 
therefore (8) can be written as 
A+A =0. (10) 


This shows that A(z) is antisymmetric. Equation (9) itself can be rewritten by 
multiplying by M on the right and using (1); 


M, = AM. (11) 
Note that if we differentiate (6) and use (11) we get the same equation 


Mi, = AM). (11), 


This shows the significance of A(t), for the motion ts independent of the reference 
time; A(rf) is called the infinitesimal generator of the motion. 


EXERCISE I. Show that if M(r) satisfies a differential equation of form (11), 
where A(1) is antisymmetric for each ¢ and the initial condition (5), then M(s) is a 


rotation for every f. 


EXERCISE 2. Suppose that A is independent of t; show that the solution of 
equation (11) satisfying the initial condition (5) is 


M(t) = e^. (12) 
EXERCISE 3. Show that when A depends on t, equation (11) is not solved by 
M(t) — ge al 


unless A(t) and A(s) commute for all s and f. 
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We investigate now M(f) near tf = 0; we assume that M(1) Æ I for t Æ 0; then for 
each ¢ x 0, M(t) has a unique axis of rotation fr): 


M(r)f(t) = f). 


We assume that f(t) depends differentiably on t; differentiating the preceding 
formula gives 


Mf + Mf, = fr. 


We assume that both / (1) and f,(1) have limits as t — 0. Letting t — O in this formula 
gives 


M,f(0) + M(0)f, =f. (13) 
Using (11) and (5), we get 


A(0) f(0) = 0. (14) 


We claim that if A(0) x 0 then this equation has essentially one solution, that is, all 
are multiples of each other. To see that there is a nontrivial solution, recall that A 
is antisymmetric: for n odd, 


det A = det A* = det(—A) = (—1)" det A = — det A. 


from which it follows that det A = 0, that is, the determinant of an antisymmetric 
matrix of odd order is zero. This proves that A is not invertible, so that (14) has a 
nontrivial solution. This fact can also be seen directly for 3 x 3 matrices by writing 
out 


0 a b 
A= | -a 0 c]. (15) 
-b —c O0 
Inspection shows that 
—c 
f-| b}, (16) 
—a 


lies in the nullspace of A. 


EXERCISE 4. Show that if A in (15) is not equal to 0, then all vectors annihilated 
by A are multiples of (16). 


EXERCISE 5. Show that the two other eigenvalues of A are iva? + b^ + c?. 
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EXERCISE 6. Show that the motion M(f) described by (12) is rotation around the 
axis through the vector f given by formula (16). Show that the angle of rotation is 
tva? + b? + c^. (Hint: Use formula (4).) 

The one-dimensional subspace spanned by f(0) satisfying (14), being the limit of 
the axes of rotation f(r), is called the instantaneous axis of rotation of the motion at 
f= U. 

Let (r) denote the angle through which M(?) rotates. Formula (4)' shows that 6() 
is a differentiable function of 7; since M(0) = 1, it follows that tr M(0) = 3, and so 
by (4)' cos6(0) = 1. This shows that 6(0) = 0. 

We determine now the derivative of A at t = 0. For this purpose we differentiate 


(4) twice with respect to t. Since trace is a linear function of matrices, the derivative 
of the trace is the trace of the derivative, and so we get 


—0, sin@ — 9 cos Ó = T My. 
Setting / = 0 gives 
6,(0) = — 5trM, (0). (17) 
To express M,,(0) we differentiate (11): 
M, = A,M + AM, = AM + A°M. 
Setting f = 0 gives 
M,,(0) = A;(0) + A^(0). 
Take the trace of both sides. Since A(r) 1s antisymmetric for every 1, SO is A, the trace 
of an antisymmetric matrix being zero, we get tr M;(O0) = tr A^(0). Using formula 
(15), a brief calculation gives 
tr A? (0) = —2(a? + b? +c’). 
Combining the last two relations and setting it into (17) gives 
8*(0) — a! b c. 
Compare this with (16); we get 
la| = If]. (18) 


The quantity 6, 1s called the instantaneous angular velocity of the motion; the vector 
f given by (16) is called the instantaneous angular velocity vector. 
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EXERCISE 7. Show that the commutator 
[A,B] = AB — BA 
of two antisymmetric matrices is antisymmetric. 
EXERCISE 8. Let A denote the 3 x 3 matrix (15); we denote the associated null 
vector (16) by f4. Obviously, f depends linearly on A. 


(a) Let A and B denote two 3 x 3 antisymmetric matrices. Show that 


tr AB — —2(fa.fe), 


where (,) denotes the standard scalar product for vectors in R°. 
EXERCISE 9. Show that the cross product can be expressed as 


JA = fa X fn. 


2. THE KINEMATICS OF FLUID FLOW 


The concept of angular velocity vector is also useful for discussing motions that are 
not rigid, such as the motion of fluids. We describe the motion of a fluid by 


x= x(y, t): (19) 


here x denotes the position of a point in the fluid at time ¢ that at time zero was 
located at y: 


x(y,0) = y. (19), 


The partial derivative of x with respect to t, v fixed, is the velocity v of the flow: 


Eu t) = x,(y,t) = v(y.t). (20) 
Or 


The mapping v — x, t fixed, is described locally by the Jacobian matrix 
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We learn in the integral calculus of functions of several variables that the 
determinant of the Jacobian J(y, ¢) is the factor by which the volume of the fluid 
initially at y is expanded at time t. We assume that the fluid is never compressed to 
zero. Since at f = 0, det J(y, 0) = det I = 1 is positive, it follows that det J(y, f) is 
positive for all z. 

We appeal now to Theorem 22° of Chapter 10 to factor the matrix J as 


J = MS, (22) 


M = Míy,r) a rotation, S = S(y, 1) selfadjoint and positive. Since J is real, so are M 
and S. Since det J and det S are positive, so is det M. Since J(1) — I as t — 0, it 
follows, see the proof of Theorem 22 in Chapter 10, that also S and M — I as: — 0. 

It follows from the spectral theory of self-adjoint matrices that S acts as 
compression or dilation along the three axes that are the eigenvectors of S. M is 
rotation; we shall calculate now the rate of rotation bv the action of M. To do this we 
differentiate (22) with respect to t: 


J, = MS; +M,S. (22)' 
We multiply (22) by M on the left; since M*M = I we get 
M'J-S. 


We multiply this relation by M, from the left, make use of the differential equation 
M, = AM, see (11), and that MM" = I. 


M,S = AMM J = AJ. 
Setting this into (22)' gives 
J, = MS, + AJ. (23) 
Set ¢ = 0: 
J,(0) = S,(0) + A(0). (23), 


We recall from (10) that A(0) is anti-self-adjoint. S, on the other hand, being the 

derivative of self-adjoint matrices, is itself self-adjoint. Thus (23), is the 

decomposition of J,(0) into its self-adjoint and anti-self-adjoint parts. 
Differentiating (21) with respect to ¢ and using (20) gives 


Ov 
Oy’ 


J, 


that is, 


Ov; 
ty fh. 
OY; 
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Thus the self-adjoint and anti-self-adjoint parts of J,(0) are 


l| /Ov; Ov; 
S, (0) 2 | —— o —— 1, 25 
1, ( ) 5 p i) ( ) 
| Ov; Ov: F 
A;;(O) = — oe |, 25 
iO) 2 E: 5.) e" 
In (15) we have given the names a, b,c to the entries of A: 
| /Óv, Own j l /Ovj Av; 
a = -= | — -— , »=-(|—- — ], 
2 \ðy öy 2 \ðy; dy, 
l Ov» s) 
c= -|—-—]. 
2\0v3 Ova 
Set this into formula (16) for the instantaneous angular velocity vector: 
OV; Ov? 
oya dyz 
l| Ov; ð | 
— — — | = zcurl v. 26 
/ 2| Ov; Ov 2 ven] 
Ova Ov, 
Oy, öy 


In words: A fluid that is flowing with velocity v has instantaneous angular velocity 
equal to 3 curl v, called its vorticity. A flow for which curl v — 0 is called 
irrotational. 

We recall from advanced calculus that a vector field v whose curl is zero can, in 
any simply connected domain, be written as the gradient of some scalar function $. 


Thus for an irrotational flow, the velocity is 
v = grad d: 


o is called the velocity potential. 

We calculate now the rate at which the fluid is being expanded. We saw earlier 
that expansion is det J. Therefore the rate at which fluid is expanded is (d/df) det J. In 
Chapter 9, Theorem 4, we have given a formula, equation (10), for the logarithmic 
derivative of the determinant: 


d 
7 log detJ = tr(J !J,). (27) 
t 


We set t = 0; according to (21)p, J(0) = I; therefore we can rewrite equation (27) as 


d | 
-z det 5(0) = tr J,(0). 
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By (24)', J, = Ov;/Oy;; therefore 


al Qv; į 
—det J = — = div v. 27 
dt 3 OY; (27) 
In words: A fluid that is flowing with velocity v is being expanded at the rate div v. 
That is why the velocity field of an incompressible fluid is divergence free. 


3. THE FREQUENCY OF SMALL VIBRATIONS 


By small vibrations we mean motions of small amplitude about a point of 
equilibrium. Since the amplitude is small, the equation of motion can be taken to be 
linear. Let us start with the one-dimensional case, the vibration of a mass m under the 
action of a spring. Denote by x = x(t) displacement of the mass from equilibrium 
x = 0. The force of the spring, restoring the mass toward equilibrium, is taken to 
be —kx, k a positive constant. Newton's law of motion, force equals mass times 
acceleration, says that, 


m + kx = 0; (28) 


here the dot symbol - denotes differentiation with respect to f. 
Multiply (28) by x: 


2 


dil a k, 
mxx + kx = — |- m + -x | = 0; 
ai fame +5 | =o 
therefore 


a Koa . 
ie T 5* =E (29) 
is a constant, independent of r. The first term in (29) is the kinetic energy of a mass m 
moving with velocity x; the second term is the potential energy stored in a spring 
displaced by the amount x. That their sum, Æ, is constant expresses the conservation 
of total energy. 
The equation of motion (28) can be solved explicitly: All solutions are of the form 


x(t) — asin "E 460]: (30) 


a is called the amplitude, @ the phase. All solutions (30) are periodic in t, with period 
p = 2r 4/ mik. The frequency, defined as the reciprocal of the period, is the number 
of vibrations the system performs per unit time: 


frequency — x Y E (31) 
Jt n 
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We note that part of this result can be deduced by dimensional analysis. From the 
fact that Ax is a force, we deduce that 


dim K& - length = dim force 

mass - length 

= mass - acceleration = ————,—- 
time” 


mass 


dim k = 


t 


T 
ume“ 


The only quantity constructed out of the two parameters m and k whose dimension is 
time is const y m/k. So we conclude that the period p of motion is given by 


p = const4/ m/k. 


Formula (31) shows that frequency is an increasing function of k, and a decreasing 
function of m. Intuitively this is clear; increasing k makes the spring stiffer and the 
vibration faster; the smaller the mass, the faster the vibration. 

We present now a far-reaching generalization of this result to the motion of a 
system of n masses on a line, each linked elastically to each other and to the origin. 
Denote by x; the position of the ith particle; Newton's second law of motion for the 
ith particle is 


mjx; — f; = 0, (32) 
where f; is the total force acting on the ith particle and zn; is its mass. We take the 
origin to be a point of equilibrium for the system, that is, all f; are zero when all the x; 
are zero. 

We denote by fy the force exerted by the jth particle on the ith. According to 


Newton's third law, the force exerted by the ith particle on the jth is —/;;. We take fij 
to be proportional to the distance of x; and x;: 


Jij = ki (x; a xi), i 7 Jj. (33) 


To satisfy fi; = —fj; we take k; = Kj. Finally. we take the force exerted from the 
origin on particle ; to be —&k;x;. Altogether we have 


fi= S| kijä, kj = —ki — » kj. (33) 
i j 


We now rewrite the system (32) in matrix form as 


Mi + Kx = 0: (32)' 


182 LINEAR ALGEBRA AND ITS APPLICATIONS 


here x denotes the vector (x1. X». ... Us Misa diagonal matrix with entries m;, and 
the elements of K are —&;; from (33). The matrix K is real and symmetric; then 
taking the scalar product of (32)' with X we obtain 


(x, Mx) + (x, Kx) = 0. 
Using the symmetry of K and M we can rewrite this as 


dfl | E 
di 5 , Mx) +5 (x, Kx) = (, 


trom which we conclude that 


[d 


(X, Mix) + =(x, Kx) = E (34) 


3] = 


2 


is a constant independent of t. The first term on the left-hand side is the kinetic 
energy of the masses, the second term the potential energy stored 1n the system when 
the particles have been displaced from the origin to x. That their sum, Æ, is constant 
during the motion is an expression of the conservation of total energy. 

We assume now that all the forces are attractive, that is, that k; and 4; are positive. 
We claim that then the matrix K is positive. For proof see Theorem 5 at the end of 
this chapter. According to inequality (5y of Chapter 10, a positive matrix K satisfies 
for all x. 


a || x I^ € (x, Kx), a positive. 


Since the diagonal matrix M is positive, combining the above inequality with (34) 
gives 


a | x |? € E. 


This shows that the amplitude || x || is uniformly bounded for all time, and 
furthermore if the total energy Æ is sufficiently small, the amplitude || x || is small. 
A second important consequence of the positivity of K is 


Theorem 2. Solutions of the differential equation (32) are uniquely 
determined by their initial data x(0) and x(0). That is, two solutions that have the 
same initial data are equal for all time. 


Proof. Since equation (32) is linear, the difference of two solutions is again a 
solution. Therefore it is sufficient to prove that if a solution x has zero initial data, 
then x(t) is zero for all t. To see this, we observe that if x(0) = 0, x(0) = 0, then 
energy E at t = O is zero. Therefore energy is zero for all t. But energy defined by 
(34) is the sum of two nonnegative terms; therefore each is zero for all z. E 
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Since equation (32) is linear, its solutions form a linear space. We shall show that 
the dimension of this space of solutions is < 2m, where n is the number of masses. To 
see this, map each solution x(f) into its initial data x(0). x(0). Since there are n 
particles, their initial data belong to a 2n-dimensional linear space. This mapping is 
linear; we claim that it is 1-to-1. According to Theorem 2, two solutions with the 
same initial data are equal: in particular the nullspace of this map is (0j. Then it 
follows from Theorem | of Chapter 3 that the dimension of the space of solutions is 
«2n. 

We turn now to finding all solutions of the equations of motion (32)'. Since the 
matrices M and K are constant, differentiating equation (32) with respect to f gives 


Mx + Kx = 0. 


In words: If x(t) is a solution of (32), so is x(t). 

The solutions of (32) form a finite-dimensional space. The mapping x — X maps 
this space into itself. According to the spectral theorem, the eigenfunctions and 
generealized eigenfunctions of this mapping span the space. 

Eigenfunctions of the map x — x satisfy the equation x — ax; the solutions of this 
are x(t) = e ^ h, where a is a complex number, / is a vector with n components, and 
n is the number of particles. Since we have shown above that each solution of (32)' is 
uniformly bounded for all t, it follows that a is pure imaginary: a = ic, c real. To 
determine c and h we set x = e'h into (32)'. We get, after dividing by e^, that 


c Mh = Kh. (35) 


This is an eigenvalue problem we have already encountered in Chapter 8, equation 
(48). We can reduce (35) to a standard eigenvalue problem bv introducing 
M'?j = k as new unknown vector into (35) and then multiplying equation (35) on 
the left by M !?, We get 


ck = MKM "R. (35) 


. i> jd a n . . . " 
Since M !?KM ^ is self-adjoint, it has n linearly independent eigenvectors 
r d * 7 7 a + 
kj... . Kn, with corresponding eigenvalues c7,....c-. Since, as we shall show, K is a 
]: l BUE l =n 

= . a _|/f _1/9 
positive matrix, so is M~'/“KM~'/~. Therefore the c; are real numbers; we take them 
to be positive. 

The corresponding 7 solutions of the differential equation (32)' are e''h;, whose 
real and imaginary parts also are solutions: 


(cos ct )hy, (sin cjt}hy, (36) 
as are all linear combinations of them: 


X aj(cos cjt)hy + *  bi(sincjr)h; = x(t); (36)' 


the a; and b; are arbitrary real numbers. 
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Theorem 3. Every solution of the differential equation (32) is of form (36). 
Proof. Solutions of the form (36)' form a 2n-dimensional space. We have shown 


that the set of all solutions is a linear space of dimension < 2n. It follows that all 
solutions are of form (36)'. LI 


ExERCISE 10. Verify that solutions of the form (36) form a 2n-dimensional 
linear space. 


The special solutions (36); are called normal modes; each is periodic, with period 
2m/c; and frequency c;/2z. These are called the natural frequencies of the 
mechanical system governed by equation (32). 

Theorem 4. Consider two differential equations of form (32)': 

Mx + Kx = 0, NY - Ly — 0, (37) 

M, K, N, L positive, real n x n matrices. Suppose that 
M>N and K-L. (38) 
Denote 27 times the natural frequencies of the first system, arranged in increasing 


order by cj € ... € c, and those of the second system by dj €... < d,. We claim 
that 


c € dj, Er ose. (39) 
Proof. We introduce an intermediate differential equation 
Mz -rL:z — 0, 
Denote its natural frequencies by f; /27r. In analogy with equation (35). the f, satisfy 
FMh = Lh, 


where h is an eigenvector. In analogy with equation (35), we can identify the 
numbers f^ as the eigenvalues of 


M "*LM^ 12 
We recall that the numbers c^ are eigenvalues of 


M^ 1/2 KM =| PI 
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Since K is assumed to be < L, it follows from Theorem | of Chapter 10 that also 
M 2KM | /2 «M VLM 1/2 
Then it follows from Theorem 16 of Chapter 10 that 
2 2 TM ! 
c; Sf, TE Ll esi (39) 


On the other hand, in analogy with equation (35)", we can identify the reciprocals 
| /f* as the eigenvalues of 


L'!ZML7?, 


whereas the reciprocals 1/4? are the eigenvalues of 


3 


LANL. 
Since N is assumed, to be < M, it follows as before that 
Lene le < L "^ML-'?, 


so by Theorem 16 of Chapter 10 


We can combine inequalities (39)' and (39)" to deduce (39). a 


Note: If either of the inequalities in (38) is strict, then all the inequalities in (39) 
are strict. 

The intuitive meaning of Theorem 4 is that if in a mechanical system we stiffen 
the forces binding the particles to each other and reduce the mass of all the particles, 
then all natural frequencies of the system increase. 

We supply now the proof of the positivity of the matrix K. 


Theorem 5. Suppose that the numbers k; and kj, i Æ j are positive. Then the 
symmetric matrix K, 


K;—-kj iF J; Ki; = k; + ) ki (40) 
izj 


is positive. 
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Proof. It suffices to show that every eigenvalue a of K is positive: 


Ku — au. (41) 


Normalize the eigenvector u of K so that the largest component, say u; equals 1, and 
all others are < I. The ith component of (41) is 


Ki; t So Kiu; = d. 


fi 
Using the definition (40) of the entries of K, this can be rewritten as 


ki + y Kk — ij) =a. 


j*i 
The left-hand side is positive: therefore, so is the right-hand side a. L 


For a more general result, see Appendix 7. 


CHAPTER 12 


Convexity 


Convexity is a primitive notion, based on nothing but the bare bones of the structure 
of linear spaces over the reals. Yet some of its basic results are surprisingly deep; 
furthermore, these results make their appearance in an astonishingly wide variety of 
topics. 

X 15 a linear space over the reals. For any pair of vectors x, y in X, the fine segment 
with endpoints x and v is defined as the set of points in X of form 


ax + (1 — a)y. 0 a- Il. (1) 


Definition. A set K in X is called convex if, whenever x and y belong to K, all 
points of the line segment with endpoints x, y also belong to K. 


Examples of Convex Sets 


(a) A = the whole space X. 

(b) K = cb, the empty set. 

(c) K = {x}, a single point. 

(d) K — any line segment. 

(e) Let / he a linear function in X; then the sets 


I(x) = c. called a hyperplane, (2) 
ia) < c called an open half-space, (3) 
lix) € c called a closed half-space, (4) 


are all convex sets. 
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Concrete Examples of Convex Sets 
(f) X the space of all polynomials with real coefficients, & the subset of all 
polynomials that are positive at every point of the interval (0, 1). 
(g) X the space of real, self-adjoint matrices, K the subset of positive matrices. 
Exercise 1. Verify that these are convex sets, 
Theorem 1. (a) The intersection of any collection of convex sets is convex. 
(b) The sum of two convex sets is convex, where the sum of two sets A and H is 
defined as the set of all sums x + y, x in K, v in H. 


EXERCISE 2. Prove these propositions. 


Using Theorem 1, we can build an astonishing variety of convex sets out of a few 
basic ones. For instance, a triangle in the plane is the intersection of three half-planes. 


Definition. A point x is called an interior point of a set 5 in X if for every vin X, 
X + belongs to 5 for all sufficiently small positive t. 


Definition. A convex set K in X is called open if every point in it is an interior point. 
EXERCISE 3. Show that an open half-space (3) is an open convex set. 


EXERCISE 4. Show that if A is an open convex set and B is convex, then A + Bis 
open and convex. 


Definition. Let K be an open convex set that contains the vector 0. We define its 
gauge function py = p as follows: For every x in X, 


p(x)- infr, ^ r»0 and in K. (5) 


Exercise 5. Let X be a Euclidean space, and let K be the open ball of radius a 
centered at the origin: || x || < a. 


(i) Show that A is a convex set. 
(ii) Show that the gauge function of K is p(x) = || x ||/a. 


EXERCISE 6. In the (m, v) plane take K to be the quarter-plane u € Lv « 1. 
Show that the gauge function of X is 


ü if uz, vð, 
(u,v) = v if 0v, w =O, 
cas u if O<u, ved, 


max(u,v) if Ogm Ügy. 
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Theorem 2. (a) The gauge function p of an open convex set K that contains the 
origin is well-defined for every x. 
(b) p is positive homogeneous: 


p(ax) — ap(x) for a > 0. (6) 

(c) p 1s subadditive: 

p(x y) S pix) + ply). (7) 

(d) p(x) « 1 iff x is in K. 

Proof. Call the set of r > 0 for which x/r is in K admissible for x. To prove 
(a) we have to show that for any x the set of admissible r is nonempty. This follows 
from the assumption that O is an interior point of K. 

(b) follows from the observation that if ris admissible for x and a > 0, then ar is 


admissible for ax. 
(c) Let s and 7 be positive numbers such that 


p(x) <s, ply) <t. (8) 


Then by definition of p as inf, it follows that s and f are admissible for x and y; 
therefore x/s and y/t belong to K. The point 


x+y sx [oy 
stt s+ts S+tt 


(9) 


lies on the line segment connecting x/s and y/t. By convexity, (x + y)/s + t belongs 
to K. This shows that s + ¢ is admissible for x + y; so by definition of p, 


P(x+y) € s+t. (10) 


Since s and ¢ can be chosen arbitrarily close to p(x) and p(y), (c) follows. 

(d) Suppose p(x) < 1: by definition there is an admissible r < 1. Since r is 
admissible, x/r belongs to K. The identity x = rx/r + (1 — r)0 shows that x lies on the 
line segment with endpoints 0 and x/r, so by convexity belongs to K. 

Conversely, suppose x belongs to K; since x is assumed to be an interior point of K 
the point x-- ex belongs to K for € 0 but small enough. This shows that 
r — 1/(1 +€) is admissible, and so by definition 


This completes the proof of the theorem. [ ] 
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EXERCISE 7. Let p be a positive homogeneous, subadditive function. Prove that 
the set K consisting of all x for which p(x) < 1 is convex and open. 


Theorem 2 gives an analytical description of the open convex sets. There is another, 
dual description. To derive it we need the following basic, and geometrically intuitive 
results. 


Theorem 3. Let K be an open convex set, and let y be a point not in K. Then 
there is an open half-space containing K but not y. 


Proof. An open half-space is by definition a set of points satisfying inequality 
I(x) < c; see (3). So we have to construct a linear function / and a number c such that 


I(x) « c for all x in K. (11) 
Iy) =e (12) 


We assume that O lies in K; otherwise shift K. Set x = Oin (11): we get 0 < c. We 


may set c = |. Let p be the gauge function of K; according to Theorem 2, points of K 
are characterized by p(x) < 1;. It follows that (11) can be stated so: 


If p(x) « 1, then /(x) < 1. (11)' 


This will certainly be the case if 


l(x) € p(x) for all x. (13) 


So Theorem 3 is a consequence of the following: there exists a linear function / 
which satisfies (13) for all x and whose value at v is 1. We show first that the two 
requirements are compatible. Requiring /(y) = | implies by linearity that /(kv) = k 
for all k. We show now that (13) is satisfied for all x of form ky; that is, for all k. 


k = l(ky) € p(ky). (14) 


For k positive, we can by (6) rewrite this as 


k € kp(y), (14) 


true because y does not belong to K and so by part (d) of Theorem 2, p(y) > 1. On 
the other hand, inequality (14) holds for k negative: since the left-hand side is less 
than 0, the right-hand side, by definition (5) of gauge function, is positive. 

The remaining task is to extend / from the line through y to all of X so that (13) is 
satisfied. The next theorem asserts that this can be done. 
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Theorem 4 (Hahn-Banach). Let p be a real-valued positive homogeneous 
subadditive function defined on a linear space X over R. Let U be a subspace of X on 

which a linear function is defined, satisfying (13): 
l(u) € p(u) for all u in U. (13), 


Then / can be extended to all of X so that (13) is satisfied for all x. 


Proof. Proof is by induction; we show that / can be extended to a subspace V 
spanned by U and any vector z not in U. That is, V consists of all vectors of form 


V — u+ fz, win U, t any real number. 
Since / is linear 
l(v) = l(u) + tl(z); 
this shows that the value of /(z) — a determines the value of / on V: 
l(v) = l(u) + ta. 
The task is to choose a so that (13) is satisfied: /(v) < p(v), that is, 
Ku) + ta € p(u + tz) (13), 

for all win U and all real z. 


We divide (13), by |r|. For t > 0, using positive homogeneity of p and linearity of 
| we get 


l(u*) +a € p(u* +2), (14), 


where u^ denotes u/t. For t < 0 we obtain 


l(u'*) — a € p(u'* — z), (14) - 
where u^* denotes —u/t. Clearly, (13), holds for all u in U and all real t iff (14) , and 
(14) hold for all u^ and u^, respectively, in U. 

We rewrite (14), as 


i 


lu) — p(u*—z) €a € p(w 4 z) — lK(w ): 


the number a has to be so chosen that this holds for all uë and w** in U. Clearly, this is 
possible iff every number on the left is less than or equal to any number on the right, 
that is, if 


l(u**) — p(u'* —z) € p(u* +z) — l(u*) (15) 
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for all u*, «°° in U. We can rewrite this inequality as 
lu^) + Iu) < plu* +z) + plu" — 2). (15) 
By linearity, the left-hand side can be written as /(u^* + u^); since (13),, holds, 
lu" +u) € p(u^ +a"). 
Since p is subadditive, 
pu^ --w)2p(u" — z+" +z) < plu™ — z) + plu" +z). 


This proves (15)', which shows that / can be extended to V. Repeating this 7 times, 
we extend / to the whole space X. a 


This completes the proof of Theorem 3. O 


Note. The Hahn-Banach Theorem holds in infinite-dimensional spaces. The 
proof is the same, with some added logical prestidigitation. 

The following result is an easy extension of Theorem 3. 

Theorem 5. Let K and H be open convex sets that are disjoint. Then there is a 


hyperplane that separates them. That is, there is a linear function / and a constant d 
such that 


l(x) « donk, l(v) > d on H. 


Proof. Define the difference K — H to consist of all differences x — y, x in K, v in 
H. It is easy to verify that this is an open, convex set. Since K and H are disjoint, 
K — H does not contain the origin. Then by Theorem 3, with y — 0, and therefore 
c = UO, there is a linear function / that is negative on K — H: 


l(x - y) « 0 for x in K,y in H. 
We can rewrite this as 
l(x) < l(y) for all x in K.v in H. 


It follows from the completeness of real numbers that there is a number d such that 
for x in K, v in H, 


Kx) € d < ly). 


Since both K and H are open, the sign of equality cannot hold; this proves 
Theorem 5. Lj 
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We show next how to use Theorem 3 to give a dual description of open convex 
sets. 


Definition. Let S be any set in X. We define its support function qs on the dual 
X' of X as follows: 


qs(I) = sup I(x), (16) 


x in 5 
where / is any linear function. 
Remark. | qs(I) may be co for some /. 


EXERCISE 8. Prove that the support function gs of any set is subadditive; that is, 
it satisfies qs(m +1) < qs(m) + qs(I) for all /, m in X’. 


EXERCISE 9. Let $ and T be arbitrary sets in X. Prove that gs;r(/) = 
qs(!) qr). 


EXERCISE 10. Show that qs,7(I) = maxi gst), qr(/)). 


Theorem 6. Let K be an open convex set, gy its support function. Then x 
belongs to K iff 


l(x) < qk(I) (17) 

for all / in X'. 
Proof. It follows from definition (16) that for every x in K I(x) € gx(l) for 
every /; therefore the strict inequality (17) holds for all interior points x in K. To see 
the converse, suppose that v is not in K. Then by Theorem 3 there is an / such that 


Kx) < 1 for all x in K, but /(y) = 1 Thus 


(y) 2 12 sup I(x) = qx(/): (18) 


x in A 


this shows that v not in K fails to satisfy (17) for some /. This proves 
Theorem 6. [| 


Definition. A convex set K in X is called closed if every open segment 
ax 4- (1 — a)y. 0 & a « 1, that belongs to K has its endpoints x and v in K. 


Examples 


The whole space X is closed. 
The empty set is closed. 
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A set consisting of a single point is closed. 
An interval of form (1) is closed. 


EXERCISE 11. Show that a closed half-space as defined by (4) is a closed con- 
vex set. 


EXERCISE 12. Show that the closed unit ball in Euclidean space, consisting of all 
points || x || € 1, is a closed convex set. 


EXERCISE 13. Show that the intersection of closed convex sets is a closed 
convex sel. 


Theorems 2, 3, and 6 have their analogue for closed convex sets. 


Theorem 7. Let K beaclosed, convex set, and v a point not in A. Then there is a 
closed half-space that contains K but not y. 


Sketch of Proof. Suppose K contains the origin. If K has no interior points, it lies 
in a lower-dimensional subspace. If it has an interior point, we choose it to be the 
origin. Then the gauge function py of K can be defined as before. If x belongs to K, 
we may choose in the definition (5) of py the value r = 1; this shows that for x in K, 
p(x) € 1, Conversely, if p(x) < 1, then by (5) x/r belongs to K for some r < I. 
Since 0 belongs to K, by convexity so does x. If py(x) = 1, then for all r > l,x/r 
belongs to K. Since K is closed, so does the endpoint x. This shows that K consists 
of all points x which satisfy p(x) < 1. We then proceed as in the proof of 
Theorem 3. E 


Theorem 7 can be rephrased as follows., 


Theorem §. Let K be a closed, convex set, gx its support function. Then x 
belongs to K iff 


I(x) S qx(l) (19) 
for all / in X’. 
EXERCISE I4. Complete the proof of Theorems 7 and 8. 


Both Theorems 6 and 8 describe convex sets as intersections of half-spaces, open 
and closed, respectively. 


Definition. Let S be an arbitrary set in X. The closed convex hull of S is defined 
as the intersection of all closed convex sets containing S. 
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Theorem 9. The closed convex hull of any set 5 is the set of points x satisfying 
I(x) <= gsf) for all / in X". 


EXERCISE 15. Prove Theorem 9. 


Let x;,...,x, denote m points in X, and p;...., denote m nonnegative 
numbers whose sum is 1. 
ni 
p20, Sopa. (20) 
[| 
Then 
y= > Piši (20)' 


is called a convex combination of x), ... Xm. 


EXERCISE 16, Show that if x;...., x, belong to a convex set, then so does any 
convex combination of them. 


Definition. A point of a convex set A that is not an interior point is called a 
boundary point of K. 


Definition. Let K be a closed, convex set. A point e of K is called an extreme 
point of K if it is not the interior point of a line segment in A. That is, x is not an 
extreme point of K if 


yandzin K, wx 


EXERCISE 17. Show that an interior point of & cannot be an extreme point. 


All extreme points are boundary points of A, but not all boundary points are 
extreme points. Take for example, K to be a convex polygon. All edges and vertices 
are boundary points, but only the vertices are extreme points. 

In three-dimensional space the set of extreme points need not be a closed set. 
Take K to be the convex hull of the points (0, 0, 110, 0, —1) and the circle 
(1 + cos 8, sin 8, 0). The extreme points of K are all the above points except (0, 0, 0). 


Definition. A convex set K is called bounded if it does not contain a ray, that is, 
a set of points of the form x + ty, Ü <r. 


Theorem 10 (Carathéodory). Let K be a nonempty closed bounded convex set 
in X, dim X = n. Then every point of K can be represented as a convex combination 
of at most {n + 1) extreme points of K. 
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Proof. We prove this inductively on the dimension of X. We distinguish two 
Cases. 


(i) K has no interior points. Suppose K contains the origin, which can always be 
arranged by shifting K appropriately. We claim that K does not contain n linearly 
independent vectors; for if it did, the convex combination of these vectors and the 
origin would also belong A; but these points constitute an n-dimensional simplex, 
full of interior points. Let m be the largest number of linearly independent vectors in 
K, and let x,,..., x, be m linearly independent vectors. Then m < m, and being 
maximal, every other vector in K is a linear combination of x4... x. This proves 
that A is contained in an m-dimensional subspace of X. By the induction hypothesis. 
Theorem 10 holds for K. 

(ii) K has interior points. Denote by Ko the set of all interior points of K. It is easy 
to show that Ay is convex and that Ay is open. We claim that A has boundary points; 
for, since K is bounded, any ray issuing from any interior point of K intersects K in an 
interval; since A is closed, the other endpoint is a boundary point y of K, 

Let y be a boundary point of K. We apply Theorem 3 to Ay and v; clearly y does 
not belong to Ky, so there is a linear functional / such that 


lv) - 1, — l(xy) € 1 for all xg in Ky. (21) 


We claim that /(x,) = 1 for all x; in X. Pick any interior point xo of K; then all points 
x on the open segment bounded by x; and x, are interior points of K, and so by (21), 
fx) <1. It follows that at the endpoint xi, /(x1) < I. 

Denote by Kı the set of those points x of K for which lx) = 1. Being the 
intersection of two closed, convex sets, A, is closed and convex: since K is bounded, 
so is Kj. Equation (21) shows that y belongs to Ki. so Ay is nonempty. 

We claim that every extreme point ¢ of K, is also an extreme point of K; for, 
suppose that 


rand win K. 


Since e belongs to A), 


_ Uz) + Mw) 


1 — ile) 2 


(22) 


Both z and w are in A; as we have shown before, /(z) and /(w) are both less than or 
equal to 1. Combining this with (22), we conclude that 


l(z) 5 i(w) = 1. 


This puts both z and w into K}. But since e is an extreme point of Kj, z = w. This 
proves that extreme points of A) are extreme points of A. 
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Since Kı lies in a hyperplane of dimension less than n, it follows from the 
induction assumption that K, has a sufficient number of extreme points, that is, every 
point in K, can be written as a convex combination of n extreme points of K;. Since 
we have shown that extreme points of K; are extreme points of K, this proves 
Theorem 10 for boundary points of K. 

Let xo be an interior point of K. We take any extreme point e of K (the previous 
argument shows that there are such things) and look at the intersection of the line 
through xo and e with K. Being the intersection of two closed convex sets, of which 
one, K, is bounded, this intersection is a closed interval. Since e is an extreme point 
of K, e is one of the end points; denote the other end point by v. Clearly, v is a 
boundary point of K. Since by construction x lies on this interval, it can be written in 
the form 


xo = py - (1 — ple, 0 «p« I. (23) 


We have shown above that v can be written as a convex combination of n extreme 
points of K. Setting this into (23) gives a representation of xg as the convex 
combination of (n+ 1) extreme points. The proof of Theorem 10 is com- 
plete. LJ 


We now give an application of Carathéodory’s theorem. 


Definition. An n x n matrix S = (5j) is called doubly stochastic if 


g Sjj > 0 for all E. T. 
(1) | 
;;—]1 for all j, 
(ii) òsi : (24) 
(iii) >> sy = 1 for all i. 
j 


Such matrices arise, as the name indicates, in probability theory. 
Clearly, the doubly stochastic matrices form a bounded, closed convex set in the 
space of all n x n matrices. 


Example. In Exercise 8 of Chapter 5 we defined the permutation matrix P 
associated with the permutation p of the integers (1, ..., n) as follows: 


l, if j= pli) 
= 2 
Pi i 0, otherwise. (25) 


EXERCISE 18. Verify that every permutation matrix is a doubly stochastic 
matrix. 
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Theorem 11 (Dénes Kónig, Garrett Birkhoff). The permutation matrices are 
the extreme points of the set of doubly stochastic matrices. 


Proof. It follows from (1) and (i1) of (24) that no entry of a doubly stochastic 
matrix can be greater than 1. Thus 0 < s; < 1. 
We claim that all permutation matrices P are extreme points; for, suppose 


A+B 
=—_, 


- 


P 


A and B doubly stochastic. It follows that if an entry of P is 1, the corresponding 
entries of A and B both must be equal to 1, and if an entry of P is zero, so must be the 
corresponding entries of A and B. This shows that A — B — P. 

Next we show the converse. We start by proving that if 5 is doubly stochastic and 
has an entry which lies between O and 1: 


ee ae (26) 


5 is not extreme. To see this we construct a sequence of entries, all of which lie 
between 0 and 1, and which lie alternatingly on the same row or on the same column. 
We choose ji so that 


0 < 5,4 < 1. (26)o, 


This is possible because the sum of elements in the igth row must be = 1, and 
since (26)5o holds. Similarly, since the sum of elements in the jist column = 1, and 
since (26)5; holds, we can choose a row i; so that 


0 « s; j, zl. (26), 


We continue in this fashion, until the same position is traversed twice. Thus a closed 
chain has been constructed. 


Sij, = Si 


. — 
Like 


"T Sij, = Sij, 
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We now define a matrix N as follows: 

(a) The entries of N are zero except for those points that lie on the chain. 

(b) The entries of N on the points of the chain are +1 and —1, in succession. 
The matrix N has the following property: 

(c) The row sums and column sums of N are zero. 
We now define two matrices A, B by 


À — 5-4 EN, B-S-—eN. 


It follows from (c) that the row sums and columns sums of A and B are both 1. By (a) 
and the construction the elements of S are positive at all points where N has a 
nonzero entry. It follows therefore that e can be chosen so small that both A and B 
have nonnegative entries. This shows that A and B both are doubly stochastic. Since 
A Æ B, and 


A-4-B 


go 


it follows that § is not an extreme point. 

It follows that extreme points of the set of doubly stochastic matrices have entries 
either O or 1. It follows from (24) that each row and each column has exactly one 1. It 
is easy to check that such a matrix is a permutation matrix. This completes the proof 
of the converse. [] 


Applying Theorem 10 in the situation described in Theorem 11, we conclude: 
Every doubly stochastic matrix can be written as a convex combination of 
permutation matrices: 


S= c(P)P, c(P) > 0, > c(P) =1. 


EXERCISE 19. Show that, except for two dimensions, the representation of 
doubly stochastic matrices as convex combinations of permutation matrices is not 
unique. 


Carathéodory's theorem has many applications in analysis. Its infinite- 
dimensional version is the Krein-Milman Theorem. 
The last item in the chapter is a kind of a dual of Carathéodory’s theorem. 


Theorem 12 (Helly). Let X be a linear space of dimension # over the reals. Let 
{K,,...,Ky} be a collection of N convex sets in X. Suppose that every subcollection 
of n + 1 sets K has a nonempty intersection. Then all K in the whole collection have 
a common point. 
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Proof (Radon). We argue by induction on N, the number of sets, starting with the 
trivial situation N = n + 1. Suppose that N > n + | and that the assertion is true for 
N — ] sets. It follows that if we omit any one of the sets, say K, the rest have a point 
x; in common: 


x; € Kj, TER (27) 
We claim that there are numbers a;,....ay, not all zero, such that 
N 
» ax; = 0 (28) 
| 
and 
N 


» a; — 0. (28)' 


l 


These represent n + 1 equations for the N unknowns. According to Corollary A’ 
(concrete version) of Theorem 1 of Chapter 3, a homogeneous system of linear 
equations has a nontrivial (i.e., not all unknowns are equal to 0) solution if the 
number of equations is less than the number of unknowns. Since in our case n + | is 
less than N, (28) and (28) have a nontrivial solution. 

It follows from (28) that not all a; can be of the same sign: there must be some 
positive ones and some negative ones. Let us renumber them so that aj, ... , ap are 
positive, the rest nonpositive. 

We define a by 


P 


a= Va; (29) 


Note that it follows from (28) that 


N 
a=-) ai (29)' 
pt 
We define v by 
| x 
y—-—) dx. (30) 
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Each of the points x;, / = 1,...,p belongs to each of the sets K;.j > p. It follows 
from (29) that (30) represents y as a convex combination of x;,..., x5. Since K; is 
convex, it follows that y belongs to K; for j > p. 

On the other hand, each x;,  — p + 1,...,N belongs to each K;.j < p. It follows 
from (29)' that (30)' represents y as a convex combination Of Xp+1s. Xw. Since Kj 
is convex, it follows that y belongs to K; for j € p. This concludes the proof of 
Helly's theorem. E 


Remark. Helly’s theorem is nontrivial even in the one-dimensional case. Here 
each K; is an interval, and the hypothesis that every K; and K; intersects implies that 
the lower endpoint a; of any K; is less than or equal to the upper endpoint 5; of any 
other K;. The point in common to all is then sup a; or inf b;, or anything in between. 


Remark. In this chapter we have defined the notions of open convex set, closed 
convex set, and bounded convex set purely in terms of the linear structure of the 
space containing the convex set. Of course the notions open, closed, bounded have a 
usual topological meaning in terms of the Euclidean distance. It is easy to see that if 
à convex set is open, closed, or bounded in the topological sense, then it is open, 
closed, or bounded in the linear sense used in this chapter. 


EXERCISE 20. Show that if a convex set in a finite-dimensional Euclidean space 
is open, or closed, or bounded in the linear sense defined above, then it is open, or 
closed, or bounded in the topological sense, and conversely. 


CHAPTER 13 


The Duality Theorem 


Let X be a linear space over the reals, dim X = n. Its dual X' consists of all linear 
functions on X. If X is represented by column vectors x of n components Xy,- -Ur 
then elements of X” are traditionally represented as row vectors £ with a components 
E,....,&,. The value of Eat x is 


$ti TUB Enta- (1) 


If we regard £ as a | x n matrix and regard x as an n x | matrix, (1) is their matrix 
product £x. 

Let Y be a subspace of X; in Chapter 2 we have defined the annihilator Y+ of Yas 
the set of all linear functions € that vanish on Y, that is, satisfy 


Ey — 0 for all y in F. (2) 


According to Theorem 3 of Chapter 2, the dual of X" is X itself. and according to 
Theorem 5 there, the annihilator of Y^ is Y itself. In words: if Ex = 0 for all E in Y^, 
then x belongs ta Y. 

Suppose Y is defined as the linear space spanned by m given vectors Vj... ya, in 
X. That is, Y consists of all vectors y of the form 


yz ay. (3) 


Clearly, £ belongs to Y iff 


Ey z0, jolla (4) 
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So for the space Y defined by (3). the duality criterion stated above can be formulated 
as follows: a vector y can be written as a linear combination (3) of m given vectors y; 
iff every E that satisfies (4) also satisfies Ey = 0. 

We are asking now for a criterion that a vector y be the linear combination of m 
given vectors y; with nonnegative coefficients: 


in 


y- N pij, p; = 0. (5) 
] 


Theorem 1 (Farkas-Minkowski). A vector y can be written as a linear 
combination of given vectors y; with nonnegative coefficients as in (5) iff every £ that 
satisfies 


Ey 20,  j21,..,m (6) 
also satisfies 


£y > 0. (6)' 


Proof. The necessity of condition (6) is evident upon multiplying (5) on the left 
by £. To show the sufficiency we consider the set K of all points y of form (5). Clearly, 
this is a convex set; we claim it is closed. To see this we first note that any vector y 
which may be represented in form (5) may be represented so in various ways. 
Among all these representations there is by local compactness one, or several, for 
which 5 ' p; is as small as possible. We call such a representation of y a minimal 
representation. 


Now let {z,} be a sequence of points of K converging to the limit z in the 
Euclidean norm. Represent each z, minimally: 


on = » Pn ij: (5)' 


We claim that 5 / p,.,; = P, is a bounded sequence. For suppose on the contrary that 
P, — oo. Since the sequence z, is convergent, it is bounded; therefore z, / P,, tends to 
Zero: 


zn Pn.j N 
—- ; — 0. 5 
p, T 2. p," ©) 


The numbers p, ;/P,, are nonnegative and their sum is 1. Therefore by compactness 
we can select a subsequence for which they converge to limits: 


J = 
Pa. j = 


P, 


qj. 
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These limits satisfy $^; = 1. It follows from (5)" that 


3 qjy; = 9. 


Subtract this from (5): 


an = x (pn.j — q;)yj- 


For each j for which g; > 0, p,,; — œ; therefore for i large enough, this is a positive 
representation of z,, showing that (5)' is not a minimal representation. This 
contradiction shows that the sequence P,, = $` p,,; is bounded. But then by local 
compactness we can select a subsequence for which pa ; — p; for all j. Let n tend to 
oo in (5)'; we obtain 


z = lmz, = D,yj- 


Thus the limit z can be represented in the form (5); this proves that the set K of all 
points of form (5) is closed in the Euclidean norm. 

We note that the origin belongs to K. 

Let y be a vector that does not belong to K. Since K is closed and convex, 
according to the hyperplane separation Theorem 7 of Chapter 12 there is a closed 
halfspace 

nx 2c (7) 
that contains K but not y: 

ny < c. (8) 
Since 0 belongs to K, it follows from (7) that 0 > c. Combining this with (8), we get 

ny « 0. (9) 
Since ky; belongs to K for any positive constant k, it follows from (7) that 

kny; = c. j=I,...,m 

for all k > 0; this is the case only if 


ny; = 9, J= Loy (10) 


Thus if v is not of form (5), there is an 7 that according to (10) satisfies (6) but 
according to (9) violates (6)'. This completes the proof of Theorem 1. (J 
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EXERCISE 1. Show that K defined by (5) is a convex set. 
We reformulate Theorem 1 in matrix language by defining the n x m matrix Y as 
Y — yi Yn): 


that is, the matrix whose columns are yy We denote the column vector formed by 
Pigs Pn by Pp: 


P= 
\ Pr 
We shall call a vector, column or row, nonnegative, denoted as > 0, if all its 
components are nonnegative. The inequality x > z means x — z > 0. 


Exercise 2. Show that if x > zand & > 0, then £x > Ez. 


Theorem l'. Given an n x m matrix Y, a vector y with n components can be 
written in the form 


y-Yp p20 (11) 
iff every row vector & that satisfies 
EY 2 0 (12) 
also satisfies 
Ey > 0. (12)' 


For the proof, we merely observe that (11) is the same as (5), (12) the same as (6), 
and (12)' the same as (6). O 


The following is a useful extension. 


Theorem 2. Given an m x m matrix Y and a column vector y with m 
components, the inequality 


yzYp p20 (13) 


can be satisfied iff every £ that satisfies 


EY 20, E20 (14) 
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also satisfies 
Ey > 0. (15) 


Proof. To prove necessity, multiply (13) by € on the left and use (14) to deduce 
(15). Conversely by definition of > 0 for vectors, (13) means that there is a column 
vector z with n components such that 


y=Yp+z2z 220, pl (13) 


We can rewrite (13) by introducing the n x n identity matrix I, the augmented 
p 


= 
k 


y=(¥,0(?), 


and (14) can be written as 


) In terms of these (13) can be written as 
(2) so a3)" 


E( Y, I) > 0. (14) 


matrix (Y, I) and the augmented vector ( 


We now apply Theorem 1° to the augmented matrix and vector to deduce that if 
(15) is satisfied whenever (14)' is, then (13)" has a solution, as asserted in 
Theorem 2. O 


Theorem 3 (Duality Theorem), Let Y be a given n x m matrix, v a given 
column vector with components, and ya given row vector with m components. 
We define two quantities, 5 and s, as follows: 
Definition 
S — sup yp (16) 
n 
for all column vectors p with m components satisfying 
y2Yp | p2o. (17) 
We call the set of p satisfying (17) admissible for the sup problem (16). 
Definition 


s= inf£y (18) 
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for all row vectors $ with components satisfying the admissibility conditions 
pst, PU (19) 
We call the set of £ satisfying (19) admissible for the inf problem (18). 


Assertion. Suppose that there are admissible vectors p and &; then 5 and s are 
finite, and 


y f. 


Proof. Let p and £ be admissible vectors. Multiply (17) by £ on the left, (19) by p 
on the right. Using Exercise 2 we conclude that 


&y = &Yp 2 yp. 
This shows that any yp is bounded from above by every £y: therefore 
5 S. (20) 


To show that equality actually holds, it suffices to display a single p admissible for 
the sup problem (16) for which 


yp 2 s. (21) 


To accomplish this, we combine (17) and (21) into a single inequality by augmenting 
the matrix Y with an extra row — y, and the vector v with an extra component —s: 


y Y 
- 9 
(X) : ( Y)» p20. (22) 


If this inequality has no solution, then according to Theorem 2 there is a row vector £ 
and a scalar o such that 


(& o d 20.  (&e)»0, (23) 
but 


(6a) ( A « 0. (24) 


We claim that œ > 0; for, if & = 0, then (23) implies that 


tY20, E20, (23) 


208 LINEAR ALGEBRA AND [TS APPLICATIONS 


and (24) that 
Ey « U. (24) 


According to the “only if" part of Theorem 2 this shows that (13), the same as (17), 
cannot be satisfied; this means that there is no admissible p, contrary to assumption. 

Having shown that œ is necessarily positive, we may, because of the homogeneity 
of (23) and (24), take a = 1. Writing out these inequalities gives 


SY2yp EU (25) 


and 


Ey < s. (26) 


Inequality (25), the same as (19), shows that £ is admissible; (26) shows that s is 
not the infimum (18). a contradiction we got into by denying that we can satisfy (21). 
Therefore (21) can be satisfied; this implies that equality holds in (20). This proves 
that 5 = s. [] 


Exercise 3. Show that the sup and inf in Theorem 3 is a maximum and 
minimum. [Hint: The sign of equality holds in (21).] 


We give now an application of the duality theorem in economics. 

We are keeping track of n different kinds of food (milk, meat, fruit, bread, etc.) 
and m different kinds of nutrients (protein, fat, carbohydrates, vitamins, etc.). We 
denote 


yj = number of units of the jth nutrient present in one unit of the ith food item. 
y; = minimum daily requirement of the jth nutrient. 
y; — price of one unit of the ith food item. 


Note that all these quantities are nonnegative. 


Suppose our daily food purchase consists of £; units of the ith food item. We insist 
on satisfying all the daily minimum requirements: 


$m, Jedem. (27) 
i 


This inequality can be satisfied, provided that each nutrient is present in at least one 
of the foods. 
The total cost of the purchase is 


25 &yi- (28) 
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A natural question is, What is the minimal cost of food that satisfies the daily 
minimum requirements? Clearly, this is the minimum of (28) subject to (27) and 
E > 0, since we cannot purchase negative amounts. If we identify the column vector 
formed by the v; with y, the row vector formed by the y; with y, and the matrix yj 
with Y, the quantity (28) to be minimized is the same as (18), and (27) is the same as 
(19). Thus the infimum s in the dualitv theorem can in this model be identified with 
minimum cost. 

To arrive at an interpretation of the supremum S we denote by {p;} a possible set 
of values for the nutrients that is consistent with the prices. That is, we require that 


yj > > Yip;, ALL eR A. (29) 
j 


The value of the minimum daily requirement is 


N vp; (30) 


Since clearly p; are nonnegative, the restriction (29) is the same as (17). The quantity 
(30) is the same as that maximized in (16). Thus the quantity § in the duality theorem 
is the largest possible value of the total daily requirement, consistent with the prices. 

A second application comes from game theory, We consider two-person, 
deterministic, zero-sum games. Such a game can (by definition) always be presented 
as a matrix game, defined as follows: 

Ann x m matrix Y, called the payoff matrix, is given. The game consists of player 
C picking one of the columns and player R picking one of the rows; neither player 
knows what the other has picked but both are familiar with the payoff matrix. If C 
chooses column j and & chooses row /, then the outcome of the game is the payment 
of the amount Y; by player C to player R. If Y; is a negative number, then R pays C. 

We think of this game as being played repeatedly many times. Furthermore, the 
players do not employ the same strategy each time, that 1s, do not pick the same row, 
respectively, column, each time, but emplov a so-called mixed strategy which 
consists of picking rows, respectively columns, at random but according to a set of 
frequencies which each player is free to choose. That is, player C will choose the jth 
column with frequency x; where x is a probability vector, that is, 


x; > 0, Sx = L. (31) 
J 


Player R will choose the ith row with frequency 7; 


n20, JY mn-l. (31)' 


Since the choices are made at random, the choices of C and R are independent of 
each other. It follows that the frequency with which C chooses column j and R 
chooses row i in the same game is the product njXj. 
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Since the payoff of C to R is Y;, the average payoff over a long time is 
> ;Xj Y ij. 
i,j 


[n vector-matrix notation that is 


nYx. (32) 


If C has picked his mix x of strategies, then by observing over a long time R can 
determine the relative frequencies that C is using, and therefore will choose his own 
mix 7 of strategies so that he maximizes his gain: 


max 7 Yx. (33) 


i} 


Suppose C is a conservative player, that is, C anticipates that R will adjust his mix so 
as to gain the maximum amount (33). Since R’s gain is C's loss, C chooses his mix x 
to minimize his loss—that 15, so that (33) 1s à minimum: 


min max 7 Yx, (34) 
X n) 


x and 7 probability vectors. 

If, on the other hand, we suppose that R is the conservative player, R will assume 
that C will guess R's mix n first and therefore C will choose x so that C's loss is 
minimized: 


min 7Y x. (33) 
X 
R therefore picks his mix 7 so that the outcome (33)' is as large as possible: 


max min r Yx. (34) 


i) x 


Theorem 4 (Minmax Theorem). The minmax (34) and the maxmin (347, 
where 7 and x are required to be probability vectors, are equal: 


min max 7Yx = max min 7 Y x. (35) 
"n i] X 


The quantity (35) is called the value of the matrix game Y. 


Proof. Denote by E the n x m matrix of all Is. For any pair of probability vectors 
i; and x, n Ex = 1. Therefore if we replace Y by Y + KE, we merely add & to both (34) 
and (34)'. For & large enough all entries of Y + KE are positive; so we may consider 
only matrices Y with all positive entries. 
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We shall apply the duality theorem with 


| 
y=(1,...,1) and y= |: |. (36) 
l 
Since y is positive, the maximum problem 
S = max yp, y> Yp.p 20 (37) 


P 


has positive admissible vectors p. Since the entries of y are positive, S > 0. We 
denote by po a vector where the maximum is achieved. 
Since Y 0, the minimum problem 


s=mingy, — &Y2 y, E20, (37) 


has admissible vectors £. We denote by &, a vector where the minimum is reached. 
According to (36), all components of y are 1; therefore ypo is the sum of the 
components of po. Since ypo = $, 


iy = ç (38) 


is a probability vector. Using an analogous argument, we deduce that 
£o / 
no = — (38) 


is a probability vector. 

We claim that x; and nọ are solutions of the minmax and maxmin problems (34) 
and (34), respectively. To see this, set po into the second part of (37), and divide by 
S. Using the definition xy = po/S, we get 


> Yxo. (39) 


Multiply this on the left with any probability vector n. Since according to (36) all 
components of v are 1, ny = l, and so 


> nYxe. (40) 


It follows from this that 


| 


— > max nYXo, 
S i 
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from which it follows that 


| i 
S > min max 1 Yx. 


X " 
On the other hand, we deduce from (40) that for all 7, 


| 
5 > min »Yx. 


from which it follows that 


| 


— > max min y Yx. 
S n X 


(41) 


(42) 


Similarly we set & for € into the second part of (37)', divide by s, and multiply by 
any probability vector x. By definition (38)', i; = &)/s; since according to (36) all 


components of y are 1, yx = 1. So we get 


l 


From this we deduce that for any probability vector x, 
l 
max nYx > -, 
n" A 


from which it follows that 


| 
min max yYx > -. 
X if ly 
On the other hand, it follows from (40)' that 
| 
min 7o Yx > -, 
X 5 
from which it follows that 
] 
max min Yx > —. 
n x 5 


Since by the duality theorem S = s, (41) and (41) together show that 


i 3 
min max yYx = —— =, 
X ]) ^ 5 


(40) 


(41) 


(42)' 


THE DUALITY THEOREM 213 


while (42) and (42) show that 


| 
max min yYx =-=-. 
D X Ay 


k 


This proves the minmax theorem. a 


The minmax theorem is due to von Neumann. It has important implications for 
economic theory. 


CHAPTER 14 


Normed Linear Spaces 


In Chapter 12, Theorem 2, we saw that every open, convex set A in a linear space X 
over E containing the onain can be described as the set of vectors x satisfying 
p(x) « 1, where p, the gauge function of K, is a subadditive, positive homogeneous 
function, positive except at the origin. Here we consider such functions with one 
additional property: evenness, that is, p(—x) = p(x). Such a function is called a 
norm, and is denoted by the symbol |x|, the same as absolute value. We list now the 
properties of a norm: 


(i) Positivity : |x| > 0 forx Xx 0, [0| — 0. 
(ii) Subadditivity: lx + yl < [al + Iyl. (1) 
(iii) Homogeneity: for any real number k, |kx| = |K||x]. 


A linear space with a norm is called a normed linear space. Except for Theorem 
4, in this chapter X denotes a finite-dimensional normed linear space. 


Definition. The set of points x in X satisfying | x | < 1 is called the open unit 
ball around the origin; the set |x| € 1 is called the closed unit ball. 


EXERCISE 1. (a) Show that the open and closed unit balls are convex. 
(b) Show that the open and closed unit balls are symmetric with respect to the 
origin, that is, if x belongs to the unit ball, so does —x. 


Definition. The distance of two vectors x and v in X is defined as 


ix — yl. 
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EXERCISE 2. Prove the triangle inequality, that is, for all x, y, z in X, 
Ir-z < k- y| + |y — zl. (2) 


Definition. Given a point y and a positive number r, the set of x satisfying 
x — y| < r is called the open ball of radius r, center y; it is denoted B(y, r). 


Examples 
X = R”, Xe E odia). 
(a) Define 
x. = max la;|. (3) 


Properties (1) and (iii) are obvious; property (11) is easy to show. 
(b) Define |x|, as the Euclidean norm: 


1/2 
xl, = (X ai?) (4) 


Properties (1) and (iii) are obvious; property (ii) was shown in Theorem 3 of 
Chapter 7. 
(c) Define 


xl, = 5 lajl. (5) 


EXERCISE 3. Prove that |x|, defined by (5) has all three properties (1) of a norm. 
The next example includes the first three as special cases: 


(d) p any real number, | < p; we define 


L/p 
x}, = (X a) | (6) 


Theorem 1. Ix], defined by (6) is a norm, that is, it has properties (1). 


Proof. Properties (i) and (iii) are obvious. To prove (ii) we need the 
following: LJ 


Holder’s Inequality. Let p and g be positive numbers that satisfy 
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Let (04,...,a,) = x and (by,..., b,) = y be two vectors; then 


xy € |xl, |y 


q (8) 
where the product xy is defined as 


xy = » ajbj; (9) 


xl» |y], are defined by (6). Equality in (8) holds iff |a;|" and |b;|" are proportional 
and sgna; = sgn bj,j = l,...,n. 


EXERCISE 4. Prove or look up a proof of Hólder's inequality. 


Note. Forp = q = 2, Hólder's inequality is the Schwarz inequality (see Theorem 1, 


Chapter 7). 


EXERCISE 5. Prove that 


ins 


X|, = lim ja}, 
xl, = lim [sl 
where |x| is defined by (3). 
Corollary. For any vector x 
|x|, = max xy (10) 
Y=! 


Proof. Inequality (8) shows that when |y|, = 1, xy cannot exceed |x|,. Therefore 
to prove (10) we have to exhibit a single vector yo, [vo], = 1, for which xyo = |x|, 


Here it is: 
yo = "7 zzi VER): c; — sgn a;la;|"/^. (11) 
P5 f 
Pp 
Clearly 
zl 
Yol, = NI (12) 
x], 
and 


ll? = > lel? = » lal? = Ix. (12) 


Combining (12) and (12) 
t^ | 
=? rl. (13) 


'o qq 


Vo 
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From (11) 


o o XZ — > la >» lajlla;l" " 1 BD _ = |x! p Pia 
xp c |x E q x = 


= |x| (13)' 


p 


where we have used (7) to set 1 + p/q = p. Formulas (13) and (13)' complete the 
proof of the corollary. a 


To prove subadditivity for |x], we use the corollary. Let x and z be any two 
vectors; then by (10), 


x + z|, = max(x+ z)y < max xy + max zy = Ix], + Iz], 
|, 1 |l, y=! 
This proves that the /" norm is subadditive. [| 


We return now to arbitrary norms. 


Definition. Two norms in a finite-dimensional linear space X, |x|, and |x|,, are 
called equivalent if there is a constant c such that for all x in X, 


x], Sel], xla Sela). (14) 


Theorem 2. In à finite-dimensional linear space, all norms are equivalent; that 
is, any two satisfy (14) with some c, depending on the pair of norms. 


Proof. Any finite-dimensional linear space X over R is isomorphic to R’, 


n = dim X; so we may take X to be R”. In Chapter 7 we introduced the Euclidean 
norm: 


| x || = ye i Ee lt (15) 


Then x = (a),....@,) can be written as 


x25 ae; (16) 


Let |x| be any other norm in R". Using subadditivity and homogeneity repeatedly 
we get 


(16) 
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Applying the Schwarz inequality to (16)' (see Theorem |, Chapter 7), we get, using 
(15). 


[72 , 1/2 
ie (DeP) (X4) =el (17) 


where c abbreviates (5^ le^). This gives one half of inequalities (14). 
To get the other half, we show first that |x] is a continuous function with respect to 
the Euclidean distance. By subadditivity, 
khsk-y-bh — bi six-»- ld. 
from which we deduce that 


ixl — ly] s je — vl. 


Using inequality (17), we get 


lxi — MI S el x— y Hl. 
which shows that |x| is a continuous function in the Euclidean norm. 

It was shown in Chapter ? that the unit sphere 5 in a finite-dimensional Euclidean 
space, || x || = 1, is a compact set. Therefore the continuous function |x| achieves its 
minimum on 5. Since by (1), |x| is positive at every point of 5, it follows that the 
minimum zn is positive. Thus we conclude that 


0 «m « |x| when || x || = 1. (18) 
Since both |x| and || x || are homogeneous functions, we conclude that 
m|| x |] S Ix] (19) 


for all x in R". This proves the second half of the inequalities (14), and proves that 
any norm in R” is equivalent in the sense of (14) with the Euclidean norm. 

The notion of equivalence is transitive; if |x|, and |x|, are both equivalent to the 
Euclidean norm, then they are equivalent to each other. This completes the proof of 
Theorem 2. [] 


Definition. A sequence {x,} in a normed linear space is called convergent to 
the limit x, denoted as lim x, = x if lim|x, — x| = 0. 


Obviously, the notion of convergence of sequences is the same with respect to two 
equivalent norms; so by Theorem 2, it is the same for any two norms. 


Definition. A set 5 in a normed linear space is called closed if it contains the 
limits of all convergent sequences [x]. x, in 5. 
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EXERCISE 6, Prove that every subspace of a finite-dimensional normed linear 
space is closed. 


Definition, A set 5 in a normed linear space is called bounded if it is contained 
in some ball, that is, if there is an R such that for all points z in S, |z| = &. Clearly, if a 
set is bounded in the sense of one norm, it is bounded in the sense of any equivalent 
norm, and so by Theorem 2 for all norms. 


m 
i. 


Definition. A sequence of vectors (x, ] in a normed linear space is called a 
Cauchy sequence if |, — x;| tends to zero as k and j tend to infinity. 


Theorem 3. (i) In a finite-dimensional normed linear space X, every Cauchy 
sequence converges to a limit. 

(ii) Every bounded infinite sequence [x,] in a finite-dimensional normed linear 
space X has à convergent subsequence. 


Property (i) of X is called completeness, and property (ii) is called focal 


comptuctness. 


Proof. (i) Introduce a Euclidean structure in X. According to Theorem 2, the 
Euclidean norm and the norm in X are equivalent. Therefore a Cauchy sequence in 
the norm of X is also a Cauchy sequence in the Euclidean norm. According to 
Theorem 16 in Chapter 7, a Cauchy sequence in a finite-dimensional Euclidean 
space converges. But then the sequence also converges in the norm of X. 

(ii) A sequence {Xa} that is bounded in the norm of X is also bounded in the 
Euclidean norm imposed on X. According to Theorem 16 of Chapter 7, it contains a 
subsequence that converges in the Euclidean norm. But then that subsequence also 
converges in the norm of X. E 


Just as in Euclidean space, see Theorem 17 in Chapter 7, part (11) of Theorem 3 
has a converse: 


Theorem 4. Let X be a normed linear space that is locally compact—that is, in 
which every bounded sequence has a convergent subsequence. Then X is finite- 
dimensional. 


Proof. We need the following result. EJ 


Lemma 5. Let Y be a finite-dimensional subspace of a normed linear space X. 
Let x be a vector in X that does not belong to XY. Then 


d= inf [x — y| 
won F 


is positive. 


220 LINEAR ALGEBRA AND ITS APPLICATIONS 


Proof. Suppose not; then there would be a sequence of vectors | v, ) in Y such that 
lim Ix FEES Yal — 0. 

In words, y, tends to x. It follows that {y,,} is a Cauchy sequence; according to part 

(1) of Theorem 3, v, converges to a limit in X. This would show that the limit x of 


[y4) belongs to Y, contrary to the choice of x. LJ 


Suppose X infinite-dimensional; we shall construct a sequence {y,} in X with the 
following properties: 


[Yal < 2, 


Yk — M 


> | for k #1. (20) 
Clearly, such a sequence is bounded and, equally clearly, contains no convergent 
subsequence. 

We shall construct the sequence recursively. Suppose yj;,....v, have been 
chosen; denote by Y the space spanned bv them. Since X is infinite-dimensional, 
there is an x in X that does not belong to Y. We appeal now to Lemma 5, 

d= inf |x — y| > 0. 
vin F 


By definition of infimum, there 1s a vector yg in Y which satisfies 


« 2d. 


|x — yo 


Define 


X — Vo 
Yn] = d (21) 


It follows from the inequality above that |v,,1| < 2. For any v in Y, yo + dy belongs 
to Y. Therefore by definition of infimum, 


Ix — yo — dy| > d. 


Dividing this by d and using the definition of Y+, we get 
Isi — y| 2 1. 
Since every yj, / — 1,...,n, belongs to Y, 
Yny — yi| > 1 oris uuu 


This completes the recursive construction of the sequence {yg} with property 


(20). O 


Theorem 4 is due to Frederic Riesz. 
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EXERCISE 7. Show that the infimum in Lemma 5 is a minimum. 


We have seen in Theorem 5 of Chapter 7 that every linear function / in a 
Euclidean space can be written in the form of a scalar product /(x) — (x. y). 
Therefore by the Schwarz inequality, Theorem | of Chapter 7, 


Mx) S x d y I 


Combining this with (19), we deduce that 


dli 


Mx)Scx. c 
m 


We can restate this as Theorem 6. 


Theorem 6. Let X be a finite-dimensional normed linear space, and let / be a 
linear function defined on X. Then there is à constant c such that 


Mx) < el] (22) 


for all x in X. 


Corollary 6. Every linear function on a finite-dimensional normed linear space 
is continuous, 


Proof. Using the linearity of / and inequality (22), we deduce that 
(x) — 1(y)] = |x — y)| < elx — yl. Li 
Definition. Denote by co the infimum of all numbers c for which (22) holds for 
all x. Clearly, (22) holds for c = c, and co is the smallest number c for which (22) 


holds: co is called the norm of the linear function /, denoted as |I|’. 


The norm of / can also be characterized as 


Ix 
ap (23) 
x0 xl 
It follows from (23) that for all x and all /, 
HEDI s IIx. (24) 


Theorem 7. X is a finite-dimensional normed linear space. 


(i) Given a linear function / defined on X, there is an x in X, x Æ 0, for which 
equality holds in (24). 

(ii) Given a vector x in X, there is a linear function / defined on X, / Æ 0, for 
which equality holds in (24). 
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Proof. (i) We shall show that the supremum definition (23) of |/| is a maximum, 
We note that the ratio |/(x)|/|x| doesn't change if we replace x by any multiple of x. 
Therefore it suffices to take the supremum (23) over the unit sphere |x| = I. 

According to Corollary 6’, /(x) is a continuous function; then so is |/(x)|. Since the 
space X is locally compact, the continuous function |/(x)| takes on its maximum 
value at some point x of the unit sphere. At this point, equality holds in (24). 

(ii) If x = 0, any / will do. 


For x Æ 0, we define /(x) = |x|; since / is linear, we set for any scalar k 
l(kx) = k|xl. (23) 


We appeal now to the Hahn-Banach Theorem, Theorem 4 in Chapter 12. We choose 
the positive homogeneous, subadditive function p(x) to be |x|, and the subspace U on 
which / is defined consists of all multiples of x. It follows from (25) that for all w in 
U, ia) € |u|. According to Hahn-Banach, / can be extended to all y of X so that 
liy) € |v| for all y. Setting —y for y, we deduce that |/(y)| € |y] as well. So by 
definition (23) of the norm of 1, it follows that /|' < 1. Since Hx) = |x|, it follows 
that |/|' = 1. so equality holds in (24). oO 


In Chapter 2 we have defined the dual of a finite-dimensional linear space X as 
the set of all linear functions / defined on X. These functions form a linear space, 
denoted as X', We have shown in Chapter 2 that the dual of X can be identified with 
X itself: X" = X, as follows. For each x in X we define a linear function f over X' by 
setting 


fü) = Kx). (26) 
We have shown in Chapter 2 that these are all the linear functions on X. 
When X is a finite-dimensional normed linear space, there is an induced norm |1[ 


in X', defined by formula (23). This, in turn, induces a norm in the dual X" of X“. 


Theorem 8, The norm induced in X" by the induced norm in X' is the same as 
the original norm in X. 


Proof. The norm of a linear function of on X' is, according to formula (23), 


Let” = sup FO. (27) 
rao |E 
The linear functions f on X' are of the form (26); setting this into (27) gives 
yi = sup. - 


m \I 
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According to (24), |/(x)|/ I|" € |x| for all / 4 0. According to part (ii) of Theorem 7, 
equality holds for some /. This proves that |f|" = |x]. [| 


EXERCISE 8. Show that |/|' defined by (23) satisfies all postulates for a norm 
listed in (1). 


Note. The dual of an infinite-dimensional normed linear space X consists of all 
linear functions on X that are bounded in the sense of (22). The induced norm on X' is 
defined by (24). Theorem 7 holds in infinite-dimensional spaces. 

The dual of X' is defined analogously. For each x in X, we can define a linear 
function f by formula (25); f is bounded and its bound equals |x|. So f lies in X"; but 
for many spaces X that are used in analysis, it is no longer true that all elements f in 
X" are of the form (26). 

Part (11) of Theorem 7 can be stated as follows: 


|x| = max /(x) (29) 
[1 


for every vector x. 
The following is an interesting generalization of (29). 


Theorem 9. Let Z be a subspace of X, v any vector in X. The distance d(v. Z) of 
y to Z 1s defined to be 


d(y,z) = inf |y — I. (30) 
zin Z 
Then 

d(y, Z) = max (y) (31) 

over all £ in X' satisfying 
M <1, l(z) = 0 forz inZ. (32) 

Proof. By definition of distance, for any € > O there is a zo in Z such that 

ly — zo| < diy, Z) + «. (33) 


For any / satisfving (32) we get, using (33) that 
L(y) = Uy) — l(zo) = Ky — zo) S [Aly — zol < d(y, Z) + €. 
Since € > 0 is arbitrary, this shows that for all / satisfying (32). 


l(y) € d(y, z). (34) 


224 LINEAR ALGEBRA AND ITS APPLICATIONS 


To show the opposite inequality we shall exhibit a linear function mm satisfying (32). 
such that m(v) = d(v.z). Since for y in Z the result is trivial, we assume that the 
vector y does not belong to Z. We define the linear subspace U to consist of all 
vectors u of the form 


u = z+ ky. z in Z, k any real number. (35) 


We define the linear function niu) in U by 
m(u) = kd(y. Z). (36) 


Obviously, m is zero for u in Z; it follows from (35), (36), and the definition (30) of d 
that 


m(u) < |u| for uin U. (37) 

By Hahn-Banach we can extend 7m to all of X so that (37) holds for all x; then 
Im| «€ 1. (37) 
Clearly, m satisfies (32); on the other hand, we see by combining (35) and (36) that 


m(v) = d(y, Z). 


Since we have seen in (34) that /(y) € d(y, Z) for all / satisfying (32), this completes 
the proof of Theorem 9. [ 


In Chapter 1 we have introduced the notion of the quotient of a linear space X by 
one of its subspaces Z. We recall the definition: two vectors x; and x» in X are 
congruent mod Z, 


X] z X5 mod Z 


if x; — x» belongs to Z. We saw that this is an equivalence relation, and therefore we 
can partition the vectors in X into congruence classes {}. The set of congruence 
classes [ ) is denoted as X/Z and can be made into a linear space; all this is described 
in Chapter 1. We note that the subspace Z is one of the congruence classes, which 
serves as the zero element of the quotient space. 

Suppose X is a normed linear space; we shall show that then there is a natural way 
of making X/Z into a normed linear space, by defining the following norm for the 
congruence classes: 


{H = inf], xe {}. (38) 
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Theorem 10. Definition (38) is a norm, that is, has all three properties (1). 
Proof. Every member x of a given congruence class {} can be described as 


X = xo —z,xg some vector in {}, z any vector in Z. We claim that property (i), 
positivity, holds: for (] # 0, 


iU] > 0. (38)' 


Suppose on the contrary that || }| = 0. In view of definition (38) this means that there 
is a sequence x; in {} such that 


lim |x;| = 0. (39) 
since all x; belong to the same class, they all can be written as 
Xj = X9 — Zj, z In Z. 
Setting this into (39) we get 


= Á), 


lim Xo — £j 
Since by Theorem 3 every linear subspace Z is closed, it follows that xo belongs to Z. 
But then every point xo — zin |) belongs to Z, and in fact { } = Z. But we saw earlier 
that |) = Z is the zero element of X/Z. Since we have stipulated |] # 0, we have a 
contradiction, that we got into by assuming |{}| = 0. 
Homogeneity is fairly obvious; we turn now to subaddivity: by definition (38) we 
can, given any c > 0, choose xo and {x}and yo in {y} so that 


wo «ixl te [yol < Ky} e (40) 


Addition of classes is defined so that x) + yo belongs to {x} + {y}. Therefore by 
definition (38), subadditivity of | - | and (39, 40), 


Ig + {y} S [xo + yol S [aol + [yo] < Hx} + Hy} + 2e. 


Since € is an arbitrary positive number, 


xt + yH S itl + liyi 


follows. This completes the proof of Theorem 10. L 


We conclude this chapter by remarking that a norm in a linear space over the 
complex numbers is defined entirely analogously, by the three properties (1). The 
theorems proved in the real case extend to the complex. To prove Theorems 7 and 9 
in the complex case, we need a complex version of the Hahn-Banach theorem, due 
to Bohnenblust-Szobcyk and Sukhomlinov. Here it 1s: 
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Theorem 11. Xisalinear space over C, and p is a real-valued function defined 
on X with the following properties: 


(i) p is absolute homogeneous: that is, it satisfies 
p(ax) = |a|p(x). 


for all complex numbers a and all x in X. 
(il) p 1s subadditive: 


p(x + y) < p(x) + ply). 
Let U be a subspace of X, and / is a linear functional defined on U that satisfies 


I(u)| S plu) (41) 


for all win U. 
Then / can be extended as a linear functional to the whole space so that 


Mx)| € p(x) (41)' 
for all x in X. 


Proof. The complex linear space X can also be regarded as a linear space over R. 
Any linear function on complex X can be split into its real and imaginary part: 


lu) = lhi(u) + ili(u), 
where /; and /» are real-valued, and linear on real U. /; and /> are related by 
hı (iu) = —hy(u). 
Conversely, if /; is a real-valued linear function over real X, 


I(x) = I(x) = il (ix) (42) 


is linear over complex X. 
We turn now to the task of extending /. It follows from (41) that /;, the real part of 
/, satishes on U the inequality 


h(u) € piu). (43) 


Therefore, by the real Hahn-Banach Theorem, /; can be extended to all of X so 
that the extended / is linear on real X and satisfies inequality (43). Define / by 
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formula (42); clearly, it is linear over complex X and is an extension of / defined on 
U. We claim that it satisfies (41)' for all x in X. To see this, we factor /(x) as 


I(x) = ar, r real, |a| = 1. 


Using the fact that if /(y) is real, it is equal to /;(y), we deduce that 
{x)| = ra (x) l(a^!x) = 1,(a~'x) < plax) = p(x). LI 


We conclude this chapter by a curious characterization of Euclidean norms 
among all norms. According to equation (53) of Chapter 7, every pair of vectors u, v 
in a Euclidean space satisfies the following identity: 


^ 


a 


| uv I^ + || u — v ||? —2lu |? 2| 


v | 


Theorem 12. This identity characterizes Euclidean space. That is, if in a real 
normed linear space X 


2 4 ju — vl? = 2ul? + 21v? (44) 


uty 
for all pairs of vectors u, v, then the norm | | is Euclidean. 
Proof. We define a scalar product in X as follows: 
A(x,y) = |x + v^ — |x — 9p. (45) 


The following properties of a scalar product follow immediately from definition 
(45): 


(x,x) = Ix, (46) 
Symmetry: 
(y,x) = (x,y), (47) 
and 
(x, -y) = —(%,y) (48) 


Next we show that (x, y) as defined in (45) is additive: 


(x T z, y) (x. y) + (z. y). (49) 


By definition (45), 


A(x+z,y) = l+ z+ y? - l+ z- y’. (50) 
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We apply now identity (44) four times: 


(i u=x+y, v =z: 


2 


+y layl ES 21x + yl? + 2lz (50), 
(ll) w= yrz v=x 

ry s yz xl = fy d + 2h] (51); 
(di) u— x—y, v=z 

Ix- y +2? + |x — y — z? = 2x — yf 2l (51)... 
(iv) w=z-y, v—x 

lz—yt x|“ +|z-y- x|“ = 2|z — yl? + 2jx]" (51), 


Add (51). and (51) 
by 2, 


4; and subtract from it (51);; and (51); we get, after dividing 


Ir yu -Ix-ycrz 
2 2 2 2 (52) 
= e +y — -yi + ly +z- ly- zl- 


The left-hand side of (52) equals 4(x--z.v), and the right-hand side is 
A(x, y) + A(z, y). This proves (49). L 


EXERCISE 9. (i) Show that for all rational r, 
(rx, y) = rix, y). 
(ii) Show that for all real k, 


[Ex v) = kine). 


CHAPTER 15 


Linear Mappings Between Normed 
Linear Spaces 


Let X and Y be a pair of finite-dimensional normed linear spaces over the reals; we 
shall denote the norm in both spaces by | |, although they have nothing to do with 
each other. The first lemma shows that every linear map of one normed linear space 
into another is bounded. 


Lemma l. For any linear map T: X — Y, there is a constant c such that for all x 
in X, 


[Tx] € cix]. (1) 


Proof. Express x with respect to a basis [xj]: 


= X axy; (2) 


then 


Tx -— Sax. 
By properties of the norm in Ff, 
[Tx| € 5 ^ laj|[Txj]. 
From this we deduce that 
Tx] < Kix... (3) 
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where 


M.-male k= D IT 


We have noted in Chapter 14 that | |... is a norm. Since we have shown in Chapter 14, 
Theorem 2, that all norms are equivalent, |x| < const |x| and (1) follows from 


(3). L 


EXERCISE I. Show that every linear map T: X — Y is continuous, that is, if lim 
Xn = x, then lim Tx, = Lx. 


In Chapter 7 we have defined the norm of a mapping of one Euclidean space into 
another. Analogously, we have the following definition. 


Definition. The norm of the linear map T: X — Y, denoted as |T|, is 
Tx 
T| = sup 2l. (4) 
xxt) |x| 


Remark I. It follows from (1) that |T] is finite. 


Remark 2. It is easy to see that |T| is the smallest value we can choose for c in 
inequality (1). 


Because of the homogeneity of norms, definition (4) can be phrased as follows: 


[T| = sup [Tx]. (4) 


{=l 


Theorem 2. |T| as defined in (4) and (4)' is a norm in the linear space of all 
linear mappings of X into Y. 


Proof. Suppose T is nonzero; that means that for some vector xo # 0, Txo 7 0. 
Then by (4), 


[Txo | 
Xo 


+ 
E] 


IT| > 


since the norms in X and Y are positive, the positivity of |T| follows. 
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To prove subadditivity we note, using (4), that when S and T are two mappings of 
X — Y, then 


IT S|— n I(T + Sa] £ sup Tad + [Sap 


< sup |Tx| + sup |Sx| = [T] + |S}. 


lxiz:i lxi 


The crux of the argument is that the supremum of a function that is the sum of two 
others is less than or equal to the sum of the separate suprema of the two summands. 
Homogeneity is obvious; this completes the proof of Theorem 2. L] 


Given any mapping T from one linear space X into another ¥ we explained in 
Chapter 3 that there is another map, called the rranspose of T and denoted as T', 
mapping Y, the dual of Y into X', the dual of X. The defining relation between the 
two maps is given in equation (9) af Chapter 3: 


(T'I x) = (1, Tx). (5) 


where x is any vector in X and / is any element of F, The scalar product on the right, 
{i y), denotes the bilinear pairing of elements y of Y and / of Y. The scalar product 
(m. x) on the left is the bilinear pairing of elements x in X and m in X. Relation (5) 
defines T'/ as an element of X. We have noted in Chapter 3 that (5) is a symmetric 
relation between T and T' and that 


T" — T. (6) 


just as X" is X and Y" is Y. 
We have shown in Chapter 14 that there is a natural way of introducing a dual 
norm in the dual A” of a normed linear space X, see Theorem 7; for m in X, 


Im! = Ed (m, x). (7) 
x = 


The dual norm for / in Y is defined similarly as sup(/. y). |v| = 1; from this definition, 


[see equation (24) of Chapter 14], it follows that 
(Ly) € Ulf Iv]. (8) 


Theorem 3. Let T be a linear mapping from a normed linear space X into 
another normed linear space Y, T' its transpose, mapping Y" into X". Then 


IT'| = |T], (9) 


where X' and F are equipped with the dual norms. 
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Proof. Apply definition (7) to m = T'E 


ITI = sup(T'l, x). 
Ix| 21 


Using definition (5) of the transpose, we can rewrite the right-hand side as 


ITI = sup (i, Tx). 
[x| 21 


Using the estimate (8) on the right, with y — Tx, we get 


IT € sup [J| [Tx]. 


x|z1 
Using (4)' to estimate |Tx| we deduce that 
[T] « jm [m]. 


By definition (4) of the norm of T’, this implies 


[T'| € [T]. (10) 
We replace now T by T' in (10); we obtain 
T"| < [T]. (10)' 
According to (6), T" — T, and according to Theorem 8 of Chapter 14, the norms in 
X" and Y", the spaces between which T" acts, are the same as the norms in X and Y. 
This shows that |T"| = |T|; now we can combine (10) and (10)' to deduce (9). This 
completes the proof of Theorem 3. LJ 
Let T be a linear map of a linear space X into Y, S another linear map of Y into 


another linear space Z. Then, as remarked in Chapter 3, we can define the product ST 
as the composite mapping of T followed by S. 


Theorem 4. Suppose X, Y; and Z above are normed linear spaces; then 
IST| < |S||T]. (11) 


Proof. By definition (4), 


ISy| € [S]|y|, [Tx] € |T||x| (12) 
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Hence 


ISTx| € |S|| Tx (13) 


< |S||T 


|x 
Applying definition (4) to ST completes the proof of inequality (11). O 


We recall that a mapping T of one linear space X into another is called invertible if it 
maps X onto Y, and is one-to-one. In this case T has an inverse, denoted as T^ !. 

In Chapter 7, Theorem 15, we have shown that if a mapping B of a Euclidean 
space into itself doesn’t differ too much from another mapping A that Is invertible, 
then B, too, is invertible. We present now a straightforward extension of this result to 
normed linear spaces. 


Theorem 5. Let X and Y be finite-dimensional normed linear spaces of the same 
dimension, and let T be a linear mapping of X into Y that is invertible. Let S be 
another linear map of X into Y that does not differ too much from T in the sense that 


| 
IS — T| « k, k=: (14) 


Then S is invertible. 


Proof. We have to show that S is one-to-one and onto. We show first that S is 
one-to-one. We argue indirectly; suppose that for xy + 0, 


Sxo = D. (15) 
Then 
Txo = (T — 5)xo. 
Since T is invertible, 
xo = T^ (T — S)xg. 
Using Theorem 4 and (14) and that |xo| > 0, we get 


Ixo] € |T^'|[T — S||xo| < [T^ '|k|xo| = [xo]. 


a contradiction; this shows that (15) is untenable and so § is one-to-one. 

According to Corollary B of Theorem 1 in Chapter 3, a mapping S of a 
linear space X into another linear space of the same dimension that 1s one-to-one 
is onto. Since we have shown that S is one-to-one, this completes the proof of 
Theorem 5. ‘= 


Theorem 5 holds for normed linear spaces that are not finite dimensional, 
provided that they are complete. Corollary B of Theorem 1 of Chapter 3 does not 
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hold in spaces of infinite dimension; therefore we need a different, more direct 
argument to invert 5. We now present such an argument. We start by recalling the 
notion of convergence in a normed linear space applied to the space of linear maps. 


Definition. Let X, Y be a pair of finite-dimensional normed linear spaces. A 
sequence {T,} of linear maps of X into Y is said to converge to the linear map T, 
denoted as lim, Ta = T, if 


lim |T, — T| = 0. (16) 


Theorem 6. Let X be a normed finite-dimensional linear space, R à linear map 
of X into itself whose norm is less than 1: 


|R| < 1. (17) 
Then 
S=1-R (18) 
is invertible, and 
s'= Yn. (18)' 
T 


Proof. Denote $75 R* as T,,, and denote Tax as y,. We claim that {yẹ} is a Cauchy 
sequence: that is, |v, — j| tends to zero as n and / tend to oo. To see this, we write 


n" 
Y. — y= Thx = Tht = y Rix. 


ii 
By the triangle inequality 
" 
Iv. — »| € 3 7 [Rx]. (19) 
fri 


Using repeatedly the multiplicative property of the norm of operators, we conclude 
that 


IR^ < [R[*. 
It follows that 
IR'x| < [R*||x| < [RP Is]. 
Set this estimate into (19); we get 


I = il S ($ rt) (20) 


Jri 
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Since |R| is assumed to be less than one, the right- -hand side of (20) tends to zero as n 
and j tend to oc. This shows that y, = Tax = x R"x is a Cauchy sequence. 


According to Theorem 3 of Chapter 14, every Cauchy sequence in a finite- 
dimensional normed linear space has a limit. We define the mapping T as 


Tx = lim T,x. (21) 


We claim that T is the inverse of I — R. According to Exercise |, the mapping I — R 
is continuous; therefore it follows from (21) that 


(I— R)Tx = lim (1 — R)T,x 
no 
Since Tom SR, 
0 
" 
(I—R)T,x = (I R) $ Rix — x — Rx. 
as n — oo, the left-hand side tends to (I — R) Tx and the right-hand side tends to x; 


this proves that T is the inverse of I — R. L] 


EXERCISE 2. Show that if for every x in X, [T,x — Tx| tends to zero as n — 2o, 
then |T, — T| tends to zero. 


H 

EXERCISE 3. Show that T, = 57 R* converges to S ^! inthe sense of definition (16). 
ü 

Theorem 6 is a special case of Theorem 5, with Y = X and T = 1. 


Exercise 4. Deduce Theorem 5 from Theorem 6 by factoring 5 = T--S - T 
as T[I — T^! (8 — T). 


EXERCISE 5. Show that Theorem 6 remains true if the hypothesis (17) is 
replaced by the following hypothesis. For some positive integer nr, 
|R" « I. (22) 


EXERCISE 6. Take X = Y = E", and T: X — X the matrix (ry): Take for the 
norm |x| the maximum norm |x|... defined by formula (3) of Chapter 14. Show that 
the norm |T| of the matrix (1j). regarded as a mapping of X into X, is 


|T| = max * ir]. (23) 
i 
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EXERCISE 7. Take X to be R” normed by the maximum norm |x|, Y to be R” 


normed by the | —norm |x|,. defined by formulas (3) and (4) in Chapter 14. Show that 
the norm of the matrix (f;) regarded as a mapping of X into Y is bounded by 


IT] € $. [ri]. 
Hj 


EXERCISE 8. X is any finite-dimensional normed linear space over C, and T is a 
linear mapping of X into X. Denote by /; the eigenvalues of T. and denote by r (T) its 
spectral radius: 


r(T) = max |tj|. 


(i) Show that |T| > r(T). 
(ii) Show that |T"| > r(T)". 
(iii) Show, using Theorem 18 of Chapter 7, that 


lim |T^|"^ = r(T). 
HX 


CHAPTER 16 


Positive Matrices 


Definition. Areal! x / matrix P is called entrywise positive if all its entries py are 
positive real numbers. 

Caution: This notion of positivity, used only in this chapter, is not to be confused 
with self-adjoint matrices that are positive in the sense of Chapter 10. 


Theorem 1 (Perron). Every positive matrix P has a dominant eigenvalue, 
denoted by A(P) which has the following properties: 


(i) (P) is positive and the associated eigenvector / has positive entries: 
Ph=A(P\i, —h0. (1) 


(ii) A(P) is a simple eigenvalue. 
(iii) Every other cigenvalue « of P is less than A(P) in absolute value: 


|x| < X(P). (2) 

(iv) P has no other eigenvector f with nonegative entries. 
Proof. We recall from Chapter 13 that inequality between vectors in E" means 
that the inequality holds for all corresponding components. We denote by pi P) the 
set of all nonnegative numbers A for which there is a nonnegative vector x Æ 0 such 


that 


Px > Ax, x>0. (3) 
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Lemma 2. For P positive, 


(i) p(P) is nonempty, and contains a positive number, 
(il) p(P) is bounded, 
(iil) p(P) is closed. 


Proof. Take any positive vector x; since P is positive, Px is a positive vector. 
Clearly, (3) will hold for 4 small enough positive; this proves (1) of the lemma. 
Since both sides of (3) are linear in x, we can normalize x so that 


bcm wed £= (1,...,1). (4) 
Multiply (3) by £ on the left: 
EPx > A£x = i. (5) 


Denote the largest component of £P by b; then bE > £P. Setting this into (5) gives 
b > A; this proves part (ii) of the lemma. 
To prove (ni) consider a sequence of A, in p(P); by definition there is a 
corresponding x, Æ 0 such that (3) holds: 
Pr, 2 Ån, Xn, Xn 2 0. (6) 


We might as well assume that the x, are normalized by (4): 
EXn = |. 


The set of nonnegative x, normalized by (4) is a closed bounded set in R” and 
therefore compact. Thus a subsequence of x, tends to a nonnegative x also 
normalized by (4), while X, tends to A. Passing to the limit of (6) shows that x, A 
satisfy (3); therefore p(P) is closed. This proves part (111) of the lemma. EI 


Having shown that p(P) is closed and bounded, it follows that it has a maximum 
Amax; Dy (1), Amax > 0. We shall show now that A44; is the dominant eigenvalue. 

The first thing to show is that Àmax is an eigenvalue. Since (3) is satisfied by Ajax, 
there is a nonnegative vector A for which 


Ph > Ant, h> 0.h #0; (7) 


we claim that equality holds in (7); for, suppose not, say in the kth component: 


»- pijhj > maxi, ixk 


(7) 
` Pihi > Amax/tk. 
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Define the vector x = h + cez, where e > 0 end e, has Ath component equal to 1, all 
other components zero. Since P is positive, replacing A by x in (7) increases each 
component of the left-hand side: Px > Ph. But only the Ath component of the right- 
hand side is increased when A is replaced by x. It follows therefore from (7)' that for 
e small enough positive, 


Px > Amat. (8) 


Since this is a strict inequality, we may replace Aga, by Ama, + 4, 6 positive but so 
small that (8) still holds. This shows that Amas + 5 belongs to p(P), contrary to the 
maximal character of Ama This proves that Ama is an eigenvalue of P and that there 
is à Corresponding eigenvector A thal is nonnegative. 

We claim now that the vector / is positive. For certainly, since P is positive and 
h > 0, it follows that Ph > 0. Since Ph = Amah. h > 0 follows, This proves part (i) 
of Theorem |. 

Next we show that Ama is simple. We observe that all eigenvectors of P with 
eigenvalue Ama must be proportional to /r; for if there were another eigenvector y not 
a multiple of h, then we could construct h + cv, c so chosen that A+ cy > 0 but one 
of the components of h + cv is zero. This contradicts our argument above that an 
eigenvector of P is nonnegative is in fact positive. 

To complete the proof of (i) we have to show that P has no generalized 
eigenvectors for the eigenvalue Amar that is, a vector y such that 


Py = ÀÁmuxy + ch. (9) 
By replacing vy by —y if necessary we can make sure that e > 0; by replacing y by 


y + bh if necessary we can make sure that y is positive; it follows then from (9) and 
h > O that Py > Amay. But then for 5 small enough, greater than 0, 


Py > (Amas + 4)y, 
contrary to Aga, being the largest number in piP). 


tO Ama, Y the corresponding eigenvector, both possibly complex: Py = xy; 
componentwise, 


> Padi =: 
J 


Using the triangle inequality for complex numbers and their absolute values, 
we get 


= |«llvil. (10) 


| 
$ nili 2 
j | 


$^ riy 
i 
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Comparing this with (3), we see that |x| belongs to p(P). If [e| were = Amas the 
vector 
vi] 


Ive] 
would be an eigenvector of P with eigenvalue Aman and thus proportional to A: 
Lyi] = chi. (11) 


Furthermore, the sign of equality would hold in (10). It is well known about complex 
numbers that this is the case only if all the v; have the same complex argument: 


y; =e" lvi, [14544 
Combining this with (11) we see that 
y; = ce"h;, that is, y = (ce™ jh. 
Thus x = Ag, and the proof of part (iii) is complete. 

To prove (iv) we recall from Chapter 6, Theorem 17, that the product of 
eigenvectors of P and its transpose P^ pertaining to different eigenvalues is zero. 
Since P" also is positive, the eigenvector £ pertaining to its dominant eigenvalue, 
which is the same as that of P, has positive entries. Since a positive vector £ does not 


annihilate a nonnegative vector f, part (iv) follows from £f = 0. This completes the 
proof of Theorem 1. L] 


The above proof is due to Bohnenblust; see R. Bellman, Introduction to Matrix 
Analysis. 


Exercise 1. Denote by AP) the set of nonnegative à such that 
Px = Ar, xr>0 
for same vector x x: 0, Show that the dominant eigenvalue A(P) satisfies 


= i 12 
A(P) n, A. (12) 


We give now some applications of Perron's theorem. 


Definition. A stochastic. matrix is an | x | matrix. S whose entries are 
nonnegative: 


5j = 0, (13) 
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and whose column sums are equal to I: 


$,g-Lh j=1,...,l (14) 


i 


The interpretation lies in the study of collections of / species, each of which has 
the possibility of changing into another. The numbers s; are called transition 
probabilities; they represent the fraction of the population of the jth species that is 
replaced by the ith species. Condition (13) is natural for this interpretation; condition 
(14) specifies that the total population. is preserved. There are interesting 
applications where this is not so. 

The kind of species that can undergo change describable as in the foregoing are 
atomic nuclei, mutants sharing a common ecological environment, and many others. 

We shall first study positive stochastic matrices, that 1s, ones for which (13) is à 
strict inequality. To these Perron's theorem is applicable and yields the following 
theorem. 


Theorem 3. Let S be a positive stochastic matrix. 


(i) The dominant eigenvalue a(S) = 1. 
(ii) Let x be any nonnegative vector; then 


lim S*x = ch, (15) 


i * 
where 4 the dominant eigenvector and c is some positive constant. 


Proof. As remarked earlier, if S is a positive matrix, so is its transpose S“. Since, 
according to Theorem 16, Chapter 6, S and S” have the same eigenvalues, it follows 
that S and S^ have the same dominant eigenvalue. Now the dominant eigenvalue of 
the transpose of a stochastic matrix is easily computed: It follows from (14) that the 
vector with all entries 1, 


is a left eigenvector of S, with eigenvalue 1. It follows from part (iv) Theorem | that 
this is the dominant eigenvector and | is the dominant eigenvalue. This proves part (i). 
To prove (11), we expand x as a sum of eigenvectors h; of S: 


x= ch. (16) 


Assuming that all eigenvectors of 5 are genuine, not generalized, we get 


S¥x= Y ch; (16), 
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Here the first component is taken to be the dominant one; so A; = X = 1, |A;| < I for 
j |. From this and (16), we conclude that 


S*x — ch, (17) 


where c = c, A= Ay, the dominant eigenvector. 
To prove that c is positive, form the scalar product of (17) with & Since 
&= S'£ = (S')"&, we get 


(S¥x,£) = (x, (S7)"£) = (x, €) — c(h.£). (17)' 


We have assumed that x is nonnegative and not equal to 0; £ and h are positive. 
Therefore it follows from (17) that e is positive. This proves part (11) of Theorem 3 
when all eigenvectors are genuine. The general case can be handled 
similarly. O 


We turn now to applications of Theorem 3 to systems whose change is governed 
by transition probabilities. Denote by xj,..... x, the population size of the jth 
species, j = 1,...,m: suppose that during a unit of time (a year, a day, a nanosecond) 
each individual of the collection changes (or gives birth to) a member of the other 
species according to the probabilities sy. If the population size is so large that 
fluctuations are unimportant, the new size of the population of the ith species will be 


ym sy. (18) 


Combining the components of the old and new population into single column vectors 
x and y, relation (18) can be expressed in the language of matrices as 


y= Sx. (18)! 


After N units of time, the population vector will be S" x. The significance of Theorem 
3 in such applications is that it shows that as N — oc, such populations tend to a 
steady distribution that does not depend on where the population started from. 

Theorem 3 is the basis of Google's search strategy. 

Theorem |—and therefore Theorem 3—depend on the positivity of the matrix P; 
in many applications we have to deal with matrices that are merely nonnegative. 
How much of Theorem | remains true for such matrices? 

The three examples. 


(i): o m 1) 


show different behavior. The first one has a dominant eigenvalue; the second has plus 
or minus | as eigenvalues, neither dominated by the other; the third has | as a double 
eigenvalue. 
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EXERCISE 2. Show that if some power P" of P is positive, then P has a dominant 
positive eigenvalue. 


There are other interesting and useful criteria for nonnegative matrices to have a 


dominant positive eigenvalue. These are combinatorial in nature; we shall not speak 
about them. There is also the following result, due to Frobenius. 


Theorem 4. Every nonnegative / x / matrix F, F Æ 0, has an eigenvalue A(F) 
with the following properties: 


(i) X(F) is nonnegative, and the associated eigenvector has nonnegative entries: 
Fh = A(F)A, h > 0. (19) 
(ii) Every other eigenvalue « is less than or equal to A(F) in absolute value: 
|| < X(F). (20) 
(iii) If |x| = ACF), then x is of the form 
p = em (F), (21) 
where k and m are positive integers, m < f. 


Remark. Theorem 4 can be used to study the asymptotically periodic behavior 
for large N of S"x, where S is a nonnegative stochastic matrix. This has applications 
to the study of cycles in population growth. 


Proof. Approximate F by a sequence F, of positive matrices. Since the 
characteristic equations of F,, tend to the characteristic equations of F, it follows 
that the eigenvalues of F, tend to the eigenvalues of F. Now define 


A(F) = lim A(F,). 


ee 


Clearly, as n — oc, inequality (20) follows from inequality (2) for F,,. To prove (1), 
we use the dominant eigenvector A, of F,,, normalized as in (4): 


E —1, — &-(L...,1). 


By compactness, a subsequence of h, converges to a limit vector /. Being the limit 
of normalized positive vectors, A is nonnegative. Each A, satisfies an equation 


F,h,, -— A( F, ity; 


letting n tend to oo we obtain relation (19) in the limit. 
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Part (iii) is trivial when A(F) = 0; so we may assume A(F) > 0; at the cost of 
multiplying F by a constant we may assume that A(F) = 1. Let « be a complex 
eigenvalue of F. w| = A(F) = 1; then « can be written as 


k — e, (22) 
Denote by y 4- iz the corresponding eigenvector: 
F(y + iz) = e” (y + iz). (23) 


Separate the real and imaginary parts: 


Fy = cos 8 y — sin@z, ; 
; (23) 
Fz = sinz + cosBy. 


The geometric interpretation of (23)' is that in the plane spanned by the vectors y and 
z, F is rotation around the origin by 6. 
Consider now the plane formed by all points x of the form 


x=h+ay+ bz, (24) 


a and b arbitrary real numbers, A the eigenvector (19). It follows from (19) and (23) 
that in this plane F acts as rotation by &. Consider now the set O formed by all 
nonnegative vectors x of form (24); if Q contains an open subset of the plane (24), it 
is a polygon. Since F is a nonnegative matrix, it maps ( into itself; since it is a 
rotation, it maps @ onto itself. Since Q has / vertices, the Ith power of F is the 
identity; this shows that F rotates O by an angle 6 = 2s /i. 

It is essential for this argument that Q be a polygon, that is, that it contain an open 
set of the plane (24). This will be the case when all components of /t are positive or 
when some components of /r are zero, but so are the corresponding components of y 
and z. For then all points x of form (24) with |a|. |b| small enough belong to Q; in this 
case Q is a polygon. 

To complete the proof of Theorem 4(ii) we turn to the case when some 
components of / are zero but the corresponding components of y or z are not. 
Arrange the components in such an order that the first j components of /r are zero, the 
rest positive. Then it follows from Fh = A that F has the following block form: 


, {% 0 
Fe-(5 ay (25) 


Denote by vp and zp the vectors formed by the first / components of y and z. By 
assumption, vy + iz; #0. Since by (23), y+ iz is an eigenvector of F with 
eigenvalue e”, it follows from (25) that yo + izo is an eigenvector of Fo: 


Folyo + i29) = e" (yo Tin). 
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Since Fp is a nonnegative j x j matrix, it follows from part (ii) of Theorem 4 already 
established that the dominant eigenvalue 4(Fo) cannot be less than |&"| = 1. We 
claim that equality holds: A(Fo) = 1. For, suppose not; then the corresponding 
eigenvector /y would satisfy 


Fohy = (1 + 5)/tg, ho > 0,4 > 0. (26) 


Denote by & the /-vector whose first j components are those of hy, the rest are zero. It 
follows from (26) that 


Fk > (1+ S)k. (26) 


It is easy to show that the dominant eigenvalue 4(F) of a nonnegative matrix can be 
characterized as the largest À for which (3) can be satisfied. Inequality (26)' would 
imply that A(F) > 1 + 4, contrary to the normalization A(F) = 1- This proves that 
A(Fo) = 1. 

We do now an induction with respect to j on part (iii) of Theorem 4. Since e” is an 
eigenvalue of the j x j matrix Fp, and A(Fo) = 1, and since j < L it follows by the 
induction hypothesis that 8 is a rational multiple of 27 with denominator less than or 
equal to j. This completes the proof of Theorem 4. [] 


CHAPTER 17 


How to Solve Systems 
of Linear Equations 


To get numerical answers out of any linear model, one must in the end obtain the 
solution of a system of linear equations. To carry out this task efficiently has 
therefore a high priority; it is not surprising that it has engaged the attention of some 
of the leading mathematicians. Two methods still in current use, Gaussian 
elimination and the Gauss-Seidel iteration, were devised by the Prince of 
Mathematicians. The great Jacobi invented an iterative method that bears his name. 

The availability of programmable, high-performance computers with large 
memories—and remember, yesterday’s high-performance computer is today’s 
pocket computer—has opened the floodgates; the size and scope of linear equations 
that could be solved efficiently has been enlarged enormously and the role of linear 
models correspondingly enhanced. The success of this effort has been due not only 
to the huge increase in computational speed and in the size of rapid access memory, 
but in equal measure to new, sophisticated, mathematical methods for solving linear 
equations. At the time von Neumann was engaged in inventing and building a 
programmable electronic computer, he devoted much time to analyzing the 
accumulation and amplification of round-off errors in Gaussian elimination. Other 
notable early efforts were the very stable methods that Givens and Householder 
found for reducing matrices to Jacobi form (see Chapter 18). 

It is instructive to recall that in the 1940s linear algebra was dead as a subject for 
research; it was ready to be entombed in textbooks. Yet only a few years later, in 
response to the opportunities created by the availability of high-speed computers, 
very fast algorithms were found for the standard matrix operations that astounded 
those who thought there were no surprises left in this subject. 

In this chapter we describe a few representative modern algorithms for solving 
linear equations. Included among them, in Section 4, is the conjugate gradient 
method developed by Lanczos, Stiefel, and Hestenes. 
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The systems of linear equations considered in this chapter are of the class that 
have exactly one solution. Such a system can be written in the form 


Ax — b, (1) 


A an invertible square matrix, b some given vector, x the vector of unknowns to be 
determined. 

An algorithm for solving the system (1) takes as its input the matrix A and the 
vector b and produces as output some approximation to the solution x. In designing 
and analyzing an algorithm we must first understand how fast and how accurately an 
aleorithm works when all the arithmetic operations are carried out exactly. Second, 
we must understand the effect of rounding, inevitable in computers that do their 
arithmetic with a finite number of digits. 

With algorithms employing billions of operations, there is a very real danger that 
round-off errors not only accumulate but are magnified in the course of the 
calculation. Algorithms for which this does not happen are called arithmetically 
stable. 

It is important to point out that the use of finite digit arithmetic places an absolute 
limitation on the accuracy with which the solution can be determined. To understand 
this, imagine a change ôb being made in the vector b appearing on the right in (1). 
Denote by dx the corresponding change in x: 


A(x + dx) = b + ôb. (2) 
since according to (1), Ax = b, we deduce that 
Aóx = db. (3) 


We shall compare the relative change in x with the relative change in b, that is, the 
ratio 


lõx| / |ób| (4) 
I] / [bl 
where the norm is convenient for the problem. The choice of relative change is 
natural when the components of vectors are floating point numbers. 

We rewrite (4) as 


b] {Sxl _ [Ax] JA-'àbl (4) 

x| lŝb| — |x|  |8b| 
The sensitivity of problem (1) to changes in > is estimated by maximum of (4)' over 
all possible x and ób. The maximum of the first factor on the right in (4) is |A|, the 
norm of A; the maximum of the second factor is |A '|, the norm of A~'. Thus we 
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conclude that the ratio (4) of the relative error in the solution x to the relative error in 
b can not be larger than 


k(A) = |A||A"! ]. (5) 
The quantity «(A) is called the condition number of the matrix A. 


EXERCISE I. Show that «(A) is > I. 


since in K-digit floating point arithmetic the relative error in b can be as large as 
107^, it follows that if equation (1) is solved using k-digit floating point arithmetic, 
the relative error in x can be as large as IO-*«(A). 

It is not surprising that the larger the condition number «(A ), the harder it is to 
solve equation (1). for «(A) — oo when the matrix À is not invertible. As we shall 
show later in this chapter, the rate of convergence of iterative methods to the exact 
solution of (1) is slow when «(A) is large. 

Denote by f the largest absolute value of the eigenvalues of A. Clearly, 


p < |A]. (6) 


Denote by « the smallest absolute value of the eigenvalues of A. Then applying 
inequality (6) to the matrix A ! we get 


< |A7H. (6) 


Combining (6) and (6) with (5) we obtain this lower bound for the condition number 
of A: 


IP. c (A). (7) 


ie 


An algorithm that, when all arithmetic operations are carried out exactly, 
furnishes in a finite number of steps the exact solution of (1) is called a direct 
method. Gaussian elimination discussed in Chapter 4 is such a method. An algorithm 
that generates a sequence of approximations that tend, if all arithmetic operations 
were carried out exactly, to the exact solution is called an iterative method. In this 
chapter we shall investigate the convergence and rate of convergence of several 
iterative methods, 

Let us denote by {x,} the sequence of approximations generated by an 
algorithm. The deviation of x, from x is called the error at the nth stage, and is 
denoted by e,,: 


En = Xr — X. (8) 
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The amount by which the nth approximation fails to satisfy equation (1) is called the 
nth. residual, and is denoted by r,: 


fa = Ax, — b. (9) 
Residual and error are related to each other by 
fy = AR. (10) 


Note that, since we do not know x, we cannot calculate the errors éa; but once we 
have calculated x, we can by formula (9) calculate Fp 

In what follows, we shall restrict our analysis to the case when the matrix A is 
real, self-adjoint, and positive; see Chapter 8 and Chapter 10 for the definition of 
these concepts. We shall use the Euclidean norm, denoted as | .lo measure the size 
of vectors. 

We denote by œ and B the smallest and largest eigenvalues of A. Positive 
definiteness of A implies that œ is positive, sec Theorem | of Chapter 10. We recall 
from Chapter 8, Theorem 12, that the norm of a positive matrix with respect to the 
Euclidean norm is its largest eigenvalue; 


IA |= £. (11) 


Since A^' also is positive, we conclude that 


| Av |= a7". (11)' 


Recalling the definitions (5) of the condition number A we conclude that for A self- 
adjoint and positive, 


(12) 


1. THE METHOD OF STEEPEST DESCENT 


The first iterative method we investigate is based on the variational characterization 
of the solution of equation (1) in the case when A is positive definite. 


Theorem 1. The solution x of (1) minimizes the functional 
E(vy) = iy, Ay) — (v, b): (13) 
here (,) denotes the Euclidean scalar product of vectors. 
Proof. We add to E(v) a constant, that is, a term independent of y: 


F(y) = E(y) + Mx. b). (14) 
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Set (13) into (14); using Ax = b and the self-adjointness of A we can express 
Fly) as 


F(y) = Hy — x, A(y — x)). (14) 
Clearly. 
F(x) — 0. 


A being positive means that (v, Av) > 0 for v # 0. Thus (14)' shows that F(v) > 0 
for y x: x. This proves that F(v), and therefore E(v), takes on its minimum at 
y zx. 


Theorem 1 shows that the task of solving (1) can be accomplished by minimizing 
E. To find the point where E assumes its minimum we shall use the method of 
steepest descent; that is, given an approximate minimizer y, we find a better 
approximation by moving from v to a new point along the direction of the negative 
gradient of E. The gradient of E is easily computed from formula (13): 
grad E(y) — Av — b. 


So if our nth approximation is x,, then the (m+ 1)st, 4,41, is 
Xy] = X, — S(AXy — b), (15) 


where s is step length in the direction —grad E. Using the concept (9) of residual, we 
can rewrite (15) as 


Kea) =Xq — Mie (15)' 


We determine s so that E(x,.,) is as small as possible. This quadratic minimum 
problem is easily solved; using (13) and (9), we have 


Elx 441) = (m — Stu, Al ty — Sy) = (Xn = ST, b) 


= E(Xn) — 8(tns tn) +48" (rs, Ata): (15) 


Its minimum ts reached for 


ans ETA (16) 
(Fy Arn) 
Theorem 2. The sequence of approximations defined by (15), with s given by 
(16), converges to the solution x of (1). 
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Proof. We need a couple of inequalities. We recall from Chapter 8 that for any 
vector r the Rayleigh quotient 


(r, Ar) 
(r,r) 


of a self-adjoint matrix A lies between the smallest and largest eigenvalues of A. In 
our case these were denoted by œ and f; so we deduce from (16) that 


| l 
-Ls <—. 17 
pins (17) 
We conclude similarly that for all vectors r, 
-1 
au e. (17) 
BT (nr ^a 


We show now that F(x,) tends to zero as n tends to oc. Since we saw in Theorem 
| that F(v), defined in (14), is positive everywhere except at y — x, it would follow 
that x, tends to x. 

We recall the concept (8) of error e, — x, — x, and its relation (10) to the residual, 
Ae, = rp. We can, using (14)' to express F, write 


F(Xn) = Xen, Aen) = x(€n. ra) = Xr», A ps). (18) 
Since E and F differ only by a constant, we deduce from (15)" that 
F(Xn41) = F(xs) — s(rn rn) - 35 (rn, Ara). 
Using the value (16) for s, we obtain 


F(xa41) = F(X) — (Fn rn). (18) 


Using (18), we can restate (18)' as 


F(xi41) = F(xn) [ mi Uu In) | (19) 


(rg, A^! ra) 


Using inequalities (17) and (17), we deduce from (19) that 


F(Xna1) € ( = z) F(x,). 
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Applving this inequality recursively, we get, using (12), that 
: p 
F (Xn) £ ( = =) F (xo). (20) 


Using the boundedness of the Rayleigh quotient from below by the smallest 
eigenvalue, we conclude from (18) that 


Q ; 
9 | Cn I? < F (Xp). 


=! 


Combining this with (20) we conclude that 


: 2 | n 
les < : ( — 5) F(xo). (21) 


This shows that the error e, tends to zero, as asserted in Theorem 2. B 


2. AN ITERATIVE METHOD USING CHEBYSHEV POLYNOMIALS 


Estimate (21) suggests that when the condition number x of A is large, x, converges 
to x very slowly. This in fact is the case; therefore there is need to devise iterative 
methods that converge faster; this will be carried out in the present and the following 
sections. 

For the method described in this section we need a priori a positive lower bound 
for the smallest eigenvalue of A and an upper bound for its largest eigenvalue: 
m<a,B<M. It follows that all eigenvalues of A lie in the interval [m,M . 
According to (12), x = E therefore « « M If m and M are sharp bounds, then « 


We generate the sequence of approximations {x,,} by the same recursion formula 
(15) as before. 


Xn41 = (I — ShA )Xn + Snb, (22) 
but we shall choose the step lengths s, to be optimal after N steps, not after each step: 
here N is some appropriately chosen number. 

Since the solution x of (1) satisfies x = (1 — s,A)x-- Sab, we obtain after 
subtracting this from (22) that 
Ons] — (I — $5 À )e,. (23) 


From this we deduce recursively that 


en = Py(A)eo, (24) 
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where Py is the polynomial 


Py(a) = | [= sna). (24) 
From (24) we can estimate the size of ey: 


Since the matrix A is self-adjoint, so is Py (A). It was shown in Chapter 8 that the 
norm of a self-adjoint matrix is max |p|,p any eigenvalue of Py, (A). According to 
Theorem 4 of Chapter 6, the spectral mapping theorem, the eigenvalues p of P(A) 
are of the form p = Py(a), where a is an eigenvalue of A. Since the eigenvalues of A 
lie in the interval |m, M], we conclude that 


en || S || PN(CA) Ill 


eo |. (25) 


| P(A) | S max |Py(a)]. (26) 


1 a M 


Clearly, to get the best estimate for || e, || out of inequalities (25) and (26), we have 
to choose the s,,5 = l,.... N so that the polynomial Py has as small a maximum on 


m, M] as possible. Polynomials of form (24) satisfy the normalizing condition 

Px(O) = I. (27) 
Among all polynomials of degree N that satisfy (27), the one that has smallest 
maximum on |m, M| is the rescaled Chebyshev polynomial. We recall that the Nth 
Chebyshev polynomial 7w is defined for —1 < u < 1 by 


Ty(u) = COS NO, U = COS A, (28) 


The rescaling takes [—1, 1] into Im, M| and enforces (27): 


l M + m — 2a M+m 
= — : ? 
Pula) ru M-m )/t É E z) 9) 


It follows from definition (28) that |7,(u)| < 1 for |u| € 1. From this and (29) we 
deduce using “ — « that 


mo 


cas 
pmax |Pw(a)| = 1 f Ty ( = r) (29) 


Setting this into (26) and using (25), we get 


dl 
|| ew <I] eo / "s (= 3] (30) 
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Since outside the interval [—1, 1] the Chebyshev polynomials tend to infinity, this 
proves that ey tends to zero as N tends to oc. 

How fast ey tends to zero depends on how large « is. This calls for estimating 
Ty(1 + €). c small: we take @ in (28) imaginary: 


e 9 
f i ev -€ 
0 = Id, u = C08 = ——-—- = l + e. 


This is a quadratic equation for e®, whose solution is 


e? = |] +e + V2e4 € =14 V2e 2 Ole). 
So 


Tw(14- €) = cosiNd = 


+1 | 7 X9 


Substituting this evaluation into (30) gives 


5 ? —N 2 N 5 
lew Il < 2(1+=) | eo iex(1 -—.) | e I (32) 


Clearly, ey tends to zero as N tends to infinity. 

When x is large, \/« is very much smaller than x: therefore for « large, the upper 
bound (32) for || ew || is very much smaller than the upper bound (21), a = N. This 
shows that the iterative method described in this section converges faster than the 
method described in Section 1. Put in another way, to achieve the same accuracy, we 
need to take far fewer steps when we use the method of this section than the method 
described in Section 1. 


EXERCISE 2. Suppose « = 100, || eo || = 1, and (1/o)F(xo) = 1; how large do 
we have to take N in order to make || ey || < 1077, (a) using the method in Section 1, 
(b) using the method in Section 2? 


To implement the method described in this section we have to pick a value of N. 
Once this is done, the values of $,, n = 1,....N are according to (24)' determined as 
the reciprocals of the roots of the modified Chebyshev polynomials (29): 


(k 4- 1/2)z 
<a) 


S, = (w 1 m — (M — m)cos 


tN] = 
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k any integer between 0 and N — 1. Theoretically, that is, imagining all arithmetic 
operations to be carried out exactly, it does not matter in what order we arrange the 
numbers są. Practically, that is, operating with finite floating-point numbers, it 
matters a great deal. Half the roots of Py lie in the left half of the interval [m, M]: for 
these roots, s > 2/(M + m), and so the matrix (I — sA) has eigenvalues greater than 
| in absolute value. Repeated application of such matrices could fatally magnify 
round-off errors and render the algorithm arithmetically unstable. 

There is a way of mitigating this instability; the other half of the roots of Py lie in 
the other half of the interval [m, M |, and for these s all eigenvalues of the matrix 
(I — sA) are less than 1. The trick is to alternate an unstable s, with a stable s;. 


3. A THREE-TERM ITERATION USING CHEBYSHEV 
POLYNOMIALS 


We describe now an entirely different way of generating the approximations 
described in Section 2 based on a recursion relation linking three consecutive 
Chebyshev polynomials. These are based on the addition formula of cosine: 


cos(n + 1)0 = cos0 cos nO F sin 0 sin n0. 
Adding these yields 
cos(n + 1)0 + cos(n — 1)0 = 2cos0cos nð. 
Using the definition (28) of Chebyshev polynomials we get 
Tni (u) + T4 4 (u) = 2uT,(u). 


The polynomials P,,, defined in (29), are rescaled Chebyshev polynomials; therefore 
they satisfy an analogous recursion relation: 


P, (a) = (i, a T v,)P,(a) + Wy, P| (a). (33) 


We will not bother to write down the exact values of u,. Va, Wn, except to note that, by 
construction, P,(0) = 1 for all n; it follows from this and (33) that 


Vg + W4 = Ll. (33) 


We define now a sequence x, recursively; we pick xo, set x) = (upA + I)xo — uob, 
and forn > | 


Xnvt = (UnA + Vq) Xn + WaXn—1 — Und. (34) 


Note that this is a three-term recursion formula, that 15, x,..; 1s determined in terms 
of x, and x, ;. Formulas (15) and (22) used in the last sections are two-term 
recursion formulas. 
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Subtract x from both sides of (34); using (33) and Ax = b we get a recursion 
formula for the errors: 


€n-] = (UnA t Vien + WpnÊn-—l- (34)' 


Solving (34) recursively, it follows that each e, can be expressed in the form 
En = Q,(A)eo, where the Q, are polynomials of degree n, with Qo = 1. Setting this 
form of e, into (34)'. we conclude that the polynomials Q, satisfy the same recursion 
relation as the P,: since Qo = Po = 1, it follows that O,, = P, for all n. Therefore 


En = n(A )eo (35) 


for all n, and not just a single preassigned value N as in equation (24) of Section 2. 


4. OPTIMAL THREE-TERM RECURSION RELATION 
In this section we shall use a three-term recursion relation of the form 
Xn4-] = (SrA + pal)x, + qdaXn-1 — Snb (36) 


to generate a sequence of approximations that converges extremely rapidly to x. 
Unlike (34), the coefficients s,,p,, and g, are not fixed in advance but will be 
evaluated in terms of r„—1 and r,, the residuals corresponding to the approximations 
x, .; and x,. Furthermore, we need no a priori estimates m, M for the eigenvalues 
of A. 

The first approximation .x 1s an arbitrary—or educated—guess. We shall use the 
corresponding residual, rg = Axo — b, to completely determine the sequence of 
coefficients in (36), 1n a somewhat roundabout fashion. We pose the following 
minimum problem: 

Among all polynomials of degree n that satisfy the normalizing condition 


determine the one that makes 


| QCAJro || (38) 


as small as possible. 

We shall show that among all polynomials of degree less than or equal to n 
satisfying condition (37) there is one that minimizes (38); denote such a polynomial 
by Qn. 

We formulate now the variational condition characterizing this minimum. Let 
R(a) be any polynomial of degree less than n; then aR(a) is of degree less than or 
equal to n. Let € be any real number: Q,(a) + eaR(a) is then a polynomial of degree 
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less than or equal to n that satisties condition (37). Since Q, minimizes (38), 
| (Qu(A) + eAR(A))rp ||? takes on its minimum at « = 0. Therefore its derivative 
with respect to e is zero there: 


(Q.(A)rm, AR(A)m) = 0. (39) 
We define now a scalar product tor polynomials Q and R as follows: 
{Q.R} = (Q(A)m, AR(A)ny). (40) 
To analyze this scalar product we introduce the eigenvectors of the matrix À: 
Af; = aif}. (41) 
Since the matrix A is real and self-adjoint, the f; can be taken to be real and 


orthonormal; since A is positive, its eigenvalues a; are positive. 
We expand n; in terms of the jfi, 


"n = >. wif}. (42) 


Since f; are eigenvectors of A, they are also eigenvectors of Q(A) and R(A), and by 
the spectral mapping theorem their eigenvalues are Q(a;). and R(a;), respectively. So 


Q(A)r = S wla — R(A)r = M "wR(ajf. (43) 


Since the f; are orthonormal, we can express the scalar product (40) for polynomials 
Q and KR as follows: 


{Q.R} = 5 wiajQ(aj)R(a;). (44) 


Theorem 3. Suppose that in the expansion (42) of rp none of the coefficients wy 
are 0; suppose further that the eigenvalues a; of A are distinct. Then (44) furnishes a 
Euclidean structure to the space of all polynomials of degree less than the order K of 
the matrix A. 


Proof. According to Chapter 7, a scalar product needs three properties. The first 
two—bilinearity and symmetry—are obvious from either (40) or (44). To show 
positivity, we note that since each a; > Ü, 


{0,0} = 5 waala) (45) 
is obviously nonnegative. Since the wy are assumed nonzero, (45) is zero iff 


Q(a;) = 0 for all apj = 1,...,.K. Since the degree of Q is less than K, it can vanish 
at K points only if Q = 0. [] 
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We can express the minimizing condition (39) concisely in the language of the 
scalar product (40): For n < K, Q, is orthogonal to all polynomials of degree less 
than n. It follows in particular that Q, is of degree n. 

According to condition (37), Qo = 1. Using the familiar Gram—Schmidt process 
we can using the orthogonality and condition (37), determine a unique sequence of 
polynomials Q,. We show now that this sequence satisfies a three-term recursion 


relation. To see this we express aQ,,(a) as linear combination of Qj. j = 0,....n4- 1: 
n+] 
au, = 2 C4,;Q;. (46) 


() 


Since the Q; are orthogonal, we can express the c; ; as 


—— laQ,. Qjj 47 
n. j iQ; Qj! ( ) 


Since A is self-adjoint, the numerator in (47) can be rewritten as 


{Qn,aQ;}, (47) 


Since forj < n — l,aQ; is a polynomial of degree less than n, it is orthogonal to Qn, 
and so (47)' is zero; therefore cp; = 0 for j < n — 1. This shows that the right-hand 
side of (46) has only three nonzero terms and can be written in the form 


aQ, _ brQn+\ 25 CQ, T dy Qi. (48) 


Since Q, is of degree n, b, x 0. For n = l.d; = 0. 
According to condition (37), Q,(0) = I for all k. Setting a = 0 in (48) we deduce 
that 


Da + Cn + d, = 0. (49) 
From (47), with j = n,n — 1 we have 


120, Qn} m {aQn, On i} 


C4, = ————. i, = —————. (50 
On, Qn} 1 Onl } 
Since 5, # 0, we can express Q,,, from (48) as follows: 

Ont = (Sna + Pn)Qn + InQn-1; (51) 
where 
| C» d, 
NH — T4 n= g 1 Hc 7 J 52 
i b, } b, d b, 4) 
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Note that it follows from (49) and (52) that 

Pas qs = 1. (53) 

Theoretically, the formulas (50) completely determine the quantities c, and d,. 
Practically, these formulas are quite useless, since in order to evaluate the curly 
brackets we need to know the polynomials Q; and evaluate Q;( A). Fortunately c, 
and d, can be evaluated more easily, as we show next. 

We start the algorithm by choosing an x); then the rest of the x, are determined by 
the recursion (36), with Sp, p,, and ga from formulas (52), (50), and (49). We have 
defined e, to be x, — x, the nth error; subtracting x from (36), making use of (53), 
that b = Ax, we obtain, 

Earl = (mA + pal)es + Gn€n-1- (54) 
We claim that 
€n = Q,(A)eo. (55) 
To see this we replace the scalar argument a in (51) by the matrix argument A: 
QualA) = (55A + Pu )Qn( A) + q4Qu- i (A). (56) 
Let both sides of (56) act on ep; we get a recurrence relation that is the same as (54), 
except that e; is replaced by Q;(A)eg. Since Qol A) = I, the two sequences have the 
same starting point, and therefore they are the same, as asserted in (55). 


We recall now that the residual n = Ax, — b is related to e, — x, — x by 
r, = Aca Applying A to (55) we obtain 


rn = QulA)ro. (57) 
Applying the mapping A to (54) gives a recursion relation for the residuals: 
Foot = (SoA + Pal )ra qaa (58) 
We now set () = Oh. R = Q, into (40), and use relation (57) to write 
[Qu Qu] = Urs. Arn) (59) 
Subsequently we set O = a(J,, A = Q, into (40), and use relation (57) to write 
[aQ,. Qn} = (Arn, Arn). (59)' 
Finally we set Q = aQ, and R = Q, , into (40), and we use relation (57) to write 


laQ. Qui] = (Arg, Afni). (59)" 
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We set these identities into (50): 


Ar, Ar, A n3 AP py 

cp AnA) g n Am Ara) (60) 
(Tn, Ar.) (Fa-1: ÅFn-1) 

From (49) we determine b, = —(c, + dn). Set these expressions into (52) and we 


obtain expressions for Sp, Pn, and g, that are simple to evaluate once r,,_; and rẹ are 
known; these residuals can be calculated as soon as we know x,_, and x, or from 
recursion (58), This completes the recursive definition of the sequence x. 


Theorem 4. Let K be the order of the matrix A, and let xy be the Kth term of the 
sequence (36), the coefficients being defined by (52) and (60). We claim that xx 
satisfies equation (1), Arg = b. 


Proof. Qx 1s defined as that polynomial of degree K which satisfies (37) and 
minimizes (38). We claim that this polynomial is pA/p4(0).pa the characteristic 
polynomial of A; note that p4(0) Æ 0, since 0 is not an eigenvalue of A. According 
to the Cayley-Hamilton theorem, Theorem 5 of Chapter 6, pa{A) = 0; clearly, 
Qk(A) —O0 minimizes || Q(A)ro |. According to (57), ry = Qk(A)ro; since 
according to the above discussion, Qk(A) = 0, this proves that the Ath residual 
ry is zero, and therefore xy exactly solves (1). a 


One should not be misled by Theorem 4; the virtue of the sequence x, is not that it 
furnishes the exact answer in K steps, but that, for a large class of matrices of 
practical interest, it furnishes an excellent approximation to the exact answer in far 
fewer steps than K. Suppose for instance that A is the discretization of an operator of 
the form identity plus a compact operator. Then most of the eigenvalues of A would 
be clustered around 1; say all but the first k eigenvalues a; of A are located in the 
interval (1 — 8, 1 + 8). 

Since Q, was defined as the minimizer of (38) subject to the condition Q(0) — 1, 
and since according to (57), Q,(A)rg = rp. we conclude that 


I| m || £ | Q(A)ro | 


for any polynomial Q of degree n that satisfies Q(0) = 1. Using formula (45) we 
write this inequality as 


| I? € Y waga), (61) 


where the w; are the coefficients in the expansion of ro. 
We set now n = k + I, and we choose Q as follows: 


Q(a) = Mt -£)n(*; nc» (62) 
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here, as before, 7, denotes the /th Chebyshev polynomial. Clearly, @ satisfies 
condition (37), Q(0) = I. For a large, Tila) is dominated by its leading term, which 


is 2^! a! . Therefore, 
2jA| i162! 
nTa (63) 


By construction, Q vanishes at aj... dig We have assumed that all the other a; lie in 
(1 — å, 1 + 8); since the Chebyshev polynomials do not exceed | in absolute value in 
(—1, 1), it follows from (62) and (63) that for j > k, 


l 
IQia;)| € const (5) i (64) 


where 


const e 2]T(1—7). (65) 
J 


Setting all this information about Q(a;) into (61) we obtain 
å . 
ll ra I ense (5 ) iw = const” (5 y IL ra IÈ (66) 
kgj 


For example if |a; — 1| < 0.2 for j > 10, and if the constant (65) is less than 10, 
then choosing / = 20 in (66) makes || rsp || less than 107? || ro ||. 


EXERCISE 3. Write a computer program to evaluate the quantities 5,. Pr, and gu. 


EXERCISE 4. Use the computer program to solve a system of equations of your 
choice. 


CHAPTER 18 


How to Calculate the Eigenvalues 
of Self-Adjoint Matrices 


1. The basis of one of the most effective methods for calculating approximately the 
eigenvalues of a self-adjoint matrix is based on the QR decomposition. 


Theorem 1. Every real invertible square matrix A can be factored as 
A — QR, (1) 


where Q is an orthogonal matrix and R is an upper triangular matrix whose diagonal 
entries are positive, 


Proof. The columns of Q are constructed out of the columns of A by Gram- 
Schmidt orthonormalization. So the jth column gy of Q is a linear combination of the 


first j columns àj,.... a OF A: 


qj = tci di, 


qi = cii + nd; 
etc. We can invert the relation between the g — s and the a — s; 


Hy — Pugs 
a> = ragi + req. 


ll 


ily rindi +... F Fanin: 
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Since A is invertible, its columns are linearly independent. It follows that all 
coefficients rj1,.... r4, in (2) are nonzero. 

We may multiply any of the vectors q; by —1, without affecting their 
orthonormality. In this way we can make all the coefficients rjj,...,7,, in (2) 
positive. Here A is an n x n matrix, 

Denote the matrix whose columns are gq;,...,q, by Q, and denote by R the matrix 


_ jr = tori Sy, 
Ry 0 fori >j. (3) 
Relation (2) can be written as a matrix product 

A -—UO0K. 


since the columns of Q are orthonormal, Q is an orthogonal matrix. 
It follows from the definition (3) of R that R is upper triangular. So A = QR is the 
sought-after factorization (1). LJ 


The factorization (1) can be used to solve the system of equations 
AX = H. 
Replace A by its factored form, 
OR« = u 


and multiply by Q" on the left. Since Q is an orthogonal matrix, Q'Q = I, and we get 


Rx = Q!u. (4) 


Since R is upper triangular and its diagonal entries are nonzero, the system of 
equations can be solved recursively, starting with the nth equation to determine .x,,, 
then the (7 — 1)st equation to determine x, ,, and so all the way down to xı. 

In this chapter we shall show how to use the QR factorization of a real symmetric 
matrix A to find its eigenvalue. The QR algorithm was invented by J.G.F. Francis in 
1961: it goes as follows: 

Let A bea real symmetric matrix; we may assume that A is invertible, for we may 
add a constant multiple of the identity to A. Find the OR factorization of A: 


A = QR. 
Define A, by switching the factors Q and R 


A, = RQ. (5) 
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We claim that 


(i) A, is real and symmetric, and 
(ii) A, has the same eigenvalues as A. 


To see these we express R in terms of A and Q by multiplying equation (1) by g' 
on the left. Since Q'Q = I, we get 


Q'A — R. 
Setting this into (5) gives 
A, = Q'AQ; (6) 


from which (i) and (ii) follow. 
We continue this process, getting a sequence of matrices { A, }, each linked to the 
next one by the relations 


Ar- QUR;, "m 
Ag = R,Qy. (8), 


From these we deduce, as before, that 
A, = Q Ay 1Q,. (9), 


It follows that all the matrices A, are symmetric, and they all have the same 
eigenvalues. 


Combining the relations (9),.(9),_ ).....(9), we get 
A, = QM TAQM, (10, 
where 
QM) = QQ... Q- (11) 
Detine similarly 
R'! = R,R, ,...R, (12) 
We claim that 
A‘ = QU gU, (13), 


For k = | this is relation (1). We argue inductively; suppose (13), , is true: 


Atl = Qi^- gli n 
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Multiply this by A on the left: 
A = AQV- Ug*-1) (14) 


Multiply equation (10), , by Q!~") on the left. Since Q^" is a product of 
orthogonal matrices, it is itself orthogonal, and so QU- QV- UT = 1. So we get that 


gta, = AQU-U, 
Combining this with (14) gives 
A! = QU-UA, READ, 


Now use (7), to express A,_;, and we get relation (13),. 
This completes the inductive proof of (13). [] 


Formula (12) defines R'' as the product of upper triangular matrices. Therefore 
R“! itself is upper triangular, and so (13), is the QR factorization of A*. 

Denote the normalized eigenvectors of A by ,,..., 5,4, its corresponding 
eigenvalues by di..... d. 

Denote by U the matrix whose columns as the eigenvectors, 


U = (itt 1 T 


and by D the diagonal matrix whose entries are dj..... d4. The spectral 
representation of A is 


A= UDU". (15) 
Therefore the spectral representation of A* is 
A‘ = UDU": (15), 


It follows from formula (15), that the columns of A* are linear combinations of 
the eigenvectors of A of the following form: 


bidii + badf us. (15) 


where bi,..., b, do not depend on k. We assume now that the eigenvalues of A are 
distinct and positive; arrange them in decreasing order: 


di >a >... >d, > 0. 


It follows then from (157 that, provided b; x 0, for k large enough the first column 


of A* is very close to a constant multiple of n. Therefore qu. the first column of 
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A. E (k) - th 
Q^. is very close to i4. Similarly, g,’, the second column of Q', would be very 
a 2 
close to w2, and so on, up to dh = Uy. 
We turn now to formula (10),: it follows that the ith diagonal element of 
Ay is 


ANT 


k) AT 
(A); = q^ Ag? = (gf, Agf). 


The quantity on the right is the Rayleigh quotient of A, evaluated at qf. It was 
explained in Chapter 8 that if the vector q} differs by € from the ith eigenvector of 
A, then the Rayleigh quotient differs by less than O(¢*) from the ith eigenvalue d; 
of A. This shows that if the QR algorithm is carried out far enough, the diagonal 
entries of A; are very good approximations to the eigenvalue of A, arranged in 
decreasing order. 


EXERCISE I. Show that the off-diagonal entries of A; tend to zero as k tends 
to 00. 


Numerical calculations bear out these contentions. 


2. Next we describe another algorithm, due to Alston Householder, for 
accomplishing the QR factorization of a matrix A. In this algorithm, Q is constructed 
as a product of particularly simple orthogonal transformations, known as reflections. 

A Householder reflection is simply a reflection across a hyperplane, that is, a 
subspace of form v!x = 0. A reflection H maps all points of the hyperplane into 
themselves, and it projects points x off the hyperplane into their reflection across the 
hyperplane. The analytical expression of H is 


(16) 


Note that if we replace v by a multiple of v, the mapping H is unchanged. 
EXERCISE 2. Show that the mapping (16) is norm-preserving. 


We shall show now how reflections can be used to accomplish the QR factorization 
of a matrix A. Q will be constructed as the product of n reflections: 


Q-—H,H,.,...H;. 
Hı is chosen so that the first column of H; A is a multiple of e; = (1,0,...,0). That 


requires that H;a; be a multiple of ej; since Hj is norm-preserving, that multiple has 
to have absolute value || a; ||. This leaves two choices: 


Hic = | ay |e OF Hia = = | aly | e1. 
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Setting x — a, into (16), we get for H, the relation 


a —v-—l|aleiy or ai—v—- || a lle, 
which gives two choices for v: 
vy =ay— || a fle: or v. =a 4+ || a lle. (17) 


We recall that the arithmetical operations in a computer carry a finite number of 
digits. Therefore when two nearly equal numbers are subtracted, the relative error in 
the difference is quite large. To prevent such loss, we choose in (17) the /arger of the 
two vectors v. or v_ for v. 

Having chosen H;. denote H;A as A: it is of form 


where Al!) is an (n — 1) x (n — 1) matrix. 
Choose H^» to be of the form 


| 0...0 
o 

H =|. | 
0 HO? 


where HÍ” is chosen as before so that the first column of 
HA! 1} 


is of the form (x*,0,.. .,0)1. Then the first column of the product H3A, is the 
same as the first column of A,;. while the second column is of the form 
[56:96 0 4 0)’. We continue in this fashion for n steps; clearly, A, = Hp... HJA 
is upper triangular. Then we set R = A, and Q=H!...H! and obtain the QR 
factorization (1) of A. [ ] 


Next we show how reflections can be used to bring any symmetric matrix A into 
tridiagonal form L by an orthogonal similarity transformation: 


0A0! = L. (18) 


0 is a product of reflections: 


S: MRNA H. (18)' 
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H, is of the form 


| 0...0 
Q0 HÙ 

H=]. | (19) 
i) 


Denote the first column of A as 


is x 
i Lat ps 


where a!!! is a column vector with n — 1 components. Then the action of H, of form 
(19) is as follows: 

H,A has the same first row as A, and the last n — | entries of the first column of 
H,A is H"a”, 

We choose H'! as a reflexion in R"^! that maps a!!! into a vector whose last n — 2 
components are zero. Thus the first column of H, A has zeros in the last n — 2 places. 

Multiplying an n x n matrix by a matrix of the form (19) on the right leaves the 
first column unaltered. Therefore the first column of 


A, = HAH] 


has zeros in the last n — 2 rows. 
In the next step we choose Hs of the form 


1 0 0 
malo tos 8 - 
"46259 T 
0 0 Hg? 


where H^! is an (n — 2) x (n — 2) reflection. Since the first column of A, has zeros 
in the last a — 2 rows, the first column of H3A, is the same as the first column of Ay. 
We choose the reflection H!” so that the second column of H5, has zeros in the last 
n — 3 rows, 

For H of form (20), multiplication on the right by H+ leaves the first two columns 
unchanged. Therefore 


A: = HAH} 


has n — 2 and n — 3 zeros, respectively, in the first and second columns. Continuing 
in this fashion, we construct the reflections Hs,..., H; 4. Their product 
0—H, 4...H, has the property that OAOT has all ijth entries zero when 
i> j+ l; But since QAO" is symmetric, so are all entries for j > i+ 1. This shows 
that OAO! is tridiagonal, g 
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We note that Jacobi proposed an algorithm for tridiagonaliring symmetric 
matrices. This was implemented by Wallce Givens. 


Theorem 2. When the QR algorithm (7),, (8), is applied to a real, symmetric, 
tridiagonal matrix L, all the matrices Ly, produced by the algorithm are real, 
symmetric, and tridiagonal, and have the same eigenvalues as L. 


Proof. We have already shown, see (9),, that Lg is symmetric and has the same 
eigenvalues as L. To show that L; is tridiagonal, we start with L = Lo tridiagonal and 
then argue by induction on k. Suppose L;., is tridiagonal and is factored as 
Le = QR. We recall that the jth column q; of Q; is a linear combination of the first j 
columns of L,: since L, is tridiagonal, the last n — f — 1 entries of q} are zero. The 
jth column of RyQ, is Ryg;; since Rẹ is upper triangular, it follows that the last 
n — j — l entries of R;q; are zero. This shows that the ijth entry of Lg = RQ is zero 
for i > j 4- L Since Ly, is symmetric, this proves that Lẹ is tridiagonal, completing 
the induction. LJ 


Having L, and thereby all subsequent Lg, in tridiagonal form greatly reduces the 
number of arithmetic operations needed to carry out the QR algorithm. 

50 the strategy for the tridiagonal case of the QR algorithm 1s to carry out the QR 
iteration until the off diagonal entries of L} are less than a small number. The 
diagonal elements of L; are good approximations to the eigenvalues of L., 


3. Deift, Nanda, and Tomei observed that the Toda flow is a continuous analogue 
of the OR iteration. Flaschka has shown that the differential equations for the Toda 
flow can be put into commutor form, that 1s, in the form 


d 
—L = BL — LB, (21) 
dt 


where L is a symmetric tridiagonal matrix 


a1 bi 0 
p | a (22) 
'. by 
Ü b, n 
and B is the antisymmetric tridiagonal matrix 
() b, 0 
Be] (23) 
b, l 
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EXERCISE 3. (i) Show that BL — LB is a tridiagonal matrix. 
(ii) Show that if L satisfies the differential equation (21), its entries satisfy 


d 4 
— d, = 2(b; — b; ), 


d 

— by = b; jij — Gg}. 
2 ( ); 
where k = l,...,n and by = b, = 0. 


Theorem 3. Solutions L(t) of equations in commutator form (21), where B is 
antisymmetric, are 1sospectral. 


Proof. Let the matrix V(t) be the solution of the differential equation. 


d 
— V = BV, V(0) — I. 25 
- VO (25) 


Since B(t) is antisymmetric, the transpose of (25) Is 


d 
"id =-V'B, V'(O)=I. (25)! 


Using the product rule for differentiation and equations (25) and (25)!, we get 


— —VtI Bv + V! BV = Q. 


Since V! V = I at! = 0, it follows that V' (1) V(1) = I for all t. This proves that for all 
t, V(t) is an orthogonal matrix. 
We claim that if L(r) is a solution of (21) and V(t) a solution of (25), then 


V'(LG)V() (26) 


is independent of r. Differentiate (26) with respect to t; using the product rule, we get 


d T T d T d 
—V ILV4+V°(—LIV+ VL—V. 27 
aV) eve) e 


Using equations (21), (25), and (25), we can rewrite (27) as 


~V'BLV + V'(BL — LV) + V'LBV. 
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which is zero. This shows that the derivative of (26) is zero, and therefore (26) is 
independent of f. At t = 0, (26) equals L(0), since V(0) is the identity: so 


V'(DL(0)V(f) = L(0). (28) 


Since V(r) is an orthogonal matrix, (28) shows that L(r) is related to L(0) by an 
orthogonal similarity. This completes the proof of Theorem 3. LI 


Formula (28) shows that if L(0) is real symmetric—which we assume—then L(r) 
is symmetric for all 1. 
The spectral representation of a symmetric matrix L is 


L = UDU", (29) 


where D is a diagonal matrix whose entries are the eigenvalues of L, and the columns 
of U are the normalized eigenvectors of L; (29) shows that a set of symmetric 
matrices whose eigenvalues are uniformly bounded is itself uniformly bounded. So 
we conclude from Theorem 3 that the set of matrices L(r) are uniformly bounded. It 
follows from this that the system of quadratic equations (24) have a solution for all 
values of t. 


Lemma 4. An off-diagonal entry 5; (1) of L(‘) is either nonzero for all f, or zero 
for all f. 


Proof. Let [to, tı] be an interval on which 5; (f) is nonzero. Divide the differential 
equation (24) for by by b; and integrate it from fp to 1j: 


fy 


log bilti) — log bx (fo) = / cen i 


fi 


Since, as we have shown, the functions a; are uniformly bounded for all r, the 
integral on the right can tend to oo only if % or f tends to oc. This shows that 
log b, (t) is bounded away from —oo, and therefore b(t) is bounded away from zero 
on any interval of r. This proves that if 5;(1) is nonzero for a single value of f, it is 
nonzero for all t. L 


If one of the off-diagonal entries 5b, of L(0) were zero, the matrix L(0) would fall 
apart into two matrices. We assume that this is not the case; then it follows from 
Lemma 4 that the b(t) are nonzero for all ¢ all k. 


Lemma 5. Suppose none of the off diagonal terms 5, in L is zero. 


(i) The first component i; of every eigenvector ug of L is nonzero. 
(ii) Each eigenvalue of L is simple. 
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Proof. (i) The first component of the eigenvalue equation 
Lit; — diui (30) 
iS 
aiu + biu = diy, (31) 
If u} were zero, it would follow from (31), since 5; Æ 0, that u2 = 0. We can then 
use the second component of (30) to deduce similarly that #3, = 0; continuing in 
this fashion, we deduce that all components of uw, are zero, a contradiction. 
(ii) Suppose on the contrary that d; is a multiple eigenvalue; then its eigenspace 
has dimension greater than |. In a space of dimension greater than one, we can 


always find a vector whose first component is zero; but this contradicts part (1) of 
Lemma 5. L 


Lemma 6. The eigenvalues dj..... d, and the first components itik, 
k= 1,...,n, of the normalized eigenvectors of L uniquely determine all entries 


01:550. and bi,...,Dg-1 OI L. 


Proof. From the spectral representation (29), we can express the entry Lj; = a; 
of L as follows: 


a= 5 dut. (32), 
From equation (31) we get 
biu», = (dy — aj)u. (33), 
Squaring both sides and summing with respect to k gives 
b; = b» (dy — ay) ik; (34), 
here we have used the fact that the matrix U is orthogonal, and therefore 
>: TA = |. 
We have shown in Lemma 4 that b(t) doesn't change sign; therefore b, is 
determined by (34),. We now set this determination of 5; into (33), to obtain the 


values of mog. 
Next we use the spectral representation (29) again to express a» = La as 


a2 = p» dius, . (32), 
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We proceed as before to the second equation in 30, which we write as 
bius, = —bjuy + (dy — az) ur. (33), 


Squaring and summing over k gives 


b; = x (—Dy uy + (d; — a> Jit) . (34), 
and so on. 
Jüngen Moser has determined the asymptotic behavior of L(f) as t tends 
to oo. [ 


Theorem 7. (Moser). L(r) is a solution of equation (21). Denote the eigenvalues 


of L by dj. .... d,, arranged in decreasing order, and denote by D the diagonal matrix 
with diagonal entries d,,....d,. Then 
lim L(t) =D (34) 
Similarly, 
lim L(t) = D., (34)_ 
i> 
where D_ is the diagonal matrix whose diagonal entries are dn. ..., dj. 
Proof. We start with the following lemma. [] 


Lemma 8. Denote by u(t) the row vector consisting of the first components of 
the normalized eigenvectors of L(1): 


u = (useless a). (35) 
Claim: 
u(O)e! 
u(t) = ———. 36) 
J u(0)e® | ( 


Proof. We have shown that when L(t) satisfies (21), L(t) and L(0) are related by 
(28). Multiplying this relation by V(r) on the left gives 


L(t)V(t) = V(ÐL(0). (28) 
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Denote as before the normalized eigenvectors of L() by a,(1). Let (28)' act on u, (0). 
Since 


L(0)u(0) = diui(0), 
we get 
L(r)V(r)u(0) = d, V(t)u, (0). 
This shows that V(r)u(0) = uj(t) are the normalized eigenvectors of L(t). 


Vit) satisfies the differential equation (25). £V = BV. Therefore u(t) = 
V(t)ug(0) satisfies 


d 
— it = Buz. 37 
dr Hy (37) 


Since B is of form (23), the first component of (37) Is 


“un = bjux. (37) 


We now use equation (33), to rewrite the right-hand side: 


“ui = (dy — aj)uy. (37)" 


Define f(1) by 
f(t) = f «os 
0 


Equation (37)" can be rewritten as 


d ^ uis (f) = 0, 


from which we deduce that 
el 76 = Ch. 
where c, 1s a constant. So 


tilt) = cue F(t), 
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where F(t) = exp [f(r)]. Since f(0) = 0, F(0) = 1, and cy = uj (0); we obtain 
unir) = w(0)e^" F(r). (38) 
In vector notation, (38) becomes 


u(t) = u(O)eP' F(r). 


Since u(t) is the first row of an orthogonal matrix, it has norm 1. This shows that F(r) 
is the normalizing factor, and it proves formula (36). [] 

(i) Since the eigenvalues of Lir) are distinct (see Lemma 5), formula (36) shows 
that as ! — oo, the first component uy (1) of u(r) is exponentially larger than the other 
components. Since the vector u(r) has norm 1, it follows that as f — ox, uj (f) tends 
to 1, and uj (1). & > 1, tend to zero at an exponential rate. 

(ii) Next we take equation (32),: 


ar) = ? dut). 
From what we have shown about mlt), it follows that a(r} tends to d; at an 


exponential rate as f — ox. 
(iii) To estimate 5, we take the representation (34),: 


Bilt) = 3 (de — a (iial). 
From a; (1) — d, and the fact that w(t) is a unit vector, we deduce that 5; (1) tends to 


zero at an exponential rate as ! — ox. 
(iv) The first two rows of u are orthogonal: 


$o un(r)ux (1) = 0. (39) 
According to (i), uj (f) — 1 and uj, (t) — 0 exponentially as ¢ — oc. It follows 


therefore from (39) that i; (1) — 0 exponentially as £ — oc. 
(v) From (31) we deduce that 


Hn d, — i Hg 


ty dy — d iha (40) 
By the explicit formula (38) we can write this as 
ult) — dy — ai (t) wz (0) CETA (41) 


uyli) — dy — aj (1) uz(0) 
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Take k > 2; then the right-hand side of (41) tends to zero as f — ^c, and therefore 
us (t) — 0 as t — oo for k > 2. We have shown in (iv) that m2;(1) — O as t — oc. 
Since (u1..... 454) is a unit vector, it follows that u;2?(f) — 1 exponentially. 

(vi) According to formula (32),. 


a(t) = 9 dyus, (1). 


Since we have shown in (v) that u»;(1) — 0 for k x 2 and that u»» — 1, it follows 
that a»(f) — d». 
(vii) Formula (34), represents b2(f) as a sum. We have shown above that all 
terms of this sum tend to zero as t — oc. It follows that 5b5(t) — 0 as 
t — o, at the usual exponential rate. 


The limiting behavior of the rest of the entries can be argued similarly; Deift et al. 
supply all the details. 

Identical arguments show that L(7) tends to D_ as f — —oc. 

Moser's proof of Theorem 7 runs along different lines. 

We conclude this chapter with four observations. 

Note l. It may surprise the reader that in Lemma 8 we present an explicit solution. 
The explanation is that the Toda lattice, of which (21) is a form, is completely 
integrable. According to Liouville's Theorem, such systems have explicit solutions. 

Note 2. Moser's Theorem is a continuous analogue of the convergence of the QR 
algorithm to D when applied to a tridiagonal matrix. 

Note 3. Deift et al. point out that (21) 1s only one of a whole class of flows of 
tridiagonal symmetric matrices that tend to D as t— oc. These flows are in 
commutator form (21), where the matrix B is taken as 


B = pL), — p(L) E 


where p is a polynomial, M, denotes the upper triangular part of M, and M . denotes 
its lower triangular part. The choice (23) for B corresponds to the choice p(L) — L. 

Note 4. Deift et al. point out that solving numerically the matrix differential 
equation (21) until such time when 5;,...,5, become less than a preassigned small 
number is a valid numerical method for finding approximately the eigenvalues of L. 
In Section 4 of their paper they present numerical examples comparing the speed of 
this method with the speed of the QR algorithm. 
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Solutions of Selected Exercises 


CHAPTER 1 
Ex 1. Suppose z is another zero: 
x+ z — x for all x. 
Set x = 0: 0 + z = 0. But also z + 0 = z, so z = O. 
Ex 3. Every polynomial p of degree < n can be written as 
p= ax"! +a" 4 an. 
Then p — (aj....,a,) is an isomorphism. 


Ex 7. If x;.x» belong to X and to Y, then x; + x» belongs to X and Y. 


Ex 10. If x; were 0, then 
lx; + , 0.x; = 0. 
jzi 


Ex 13. (i) If x,-— x» is in Y, and x» — x4 is in Y, then so is their sum 
X1— X5 -r Y — X4 — X41 — X5 


Ex 14. Suppose {xı} and {x2} have a vector x; in common. Then x3 = x, and 
X3 = x5; but then x; = x2, so [x1] = {x2}. 


Linear Algebra and Its Applications, Second Edition, by Peter D. Lax 
Copyright © 2007 John Wiley & Sons, Inc. 


278 


SOLUTIONS OF SELECTED EXERCISES 279 


Ex 16. (i) Polynomials of degree < n that are zero at 14, . . . 4; can be written in the 


form 
q(t) |I (t — ti). 


where q is a polynomial of degree < n — j. These clearly form a linear space, whose 
dimension is n — j. 
By Theorem 6, 


dim X/Y = dim X — dim Y = n — (n — j) = j. 


The quotient space X/Y can be identified with the space of vector 


(p(t J so ,p(tj)). 
Ex 19. Use Theorem 6 and Exercise 18. 


Ex 20. (b) and (d) are subspaces 


Ex 21. The statement is false: here is an example to the contrary: 
X = R^ = (x, y)space 
U = {y = 0},V = {x = 0}, W = {x= y}. 
U+V+W=R*,UNV = {0},UNW = (0) 
VNW={0},UNVOW =O. 


So 
241+1+1-0-0-0-0. 


CHAPTER 2 
Ex 4. We choose m; = ms; then (9) is satisfied for p(r) = t. For p(t) = 1 and 
p(t) = £^, (9) says that 


2 5 
2 = 2m, + mo, j" 2mja^. 


So 


n, = and m» = 2 — 


3a?" 


| 
3a^ 
from which (ii) follows. (iii) (9) holds for all odd polynomials like P and P. For 
p(t) = t*, (9) says that 


x. 
= 2mia = —«', 
3 


A| e 


which holds for a = 4/3/5. 
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Ex S5. Take m, = ma. mo» = my, in order to satisfy (9) for all odd polynomials. For 
p(t) = 1 and p(t) = £? we get two equations easily solved. 


Ex 6. (a) Suppose there is a linear relation 
al (p) + bls(p) + cl3(p) = Ü, 


Set p = p(x) = (x — &)(x — £i). Then p(5;) = p(§3) = 0, p(&i) 7: 0: so we get 
from the above relation that a = 0. Similarly b = 0, c = 0. 

(b) Since dim P» = 3, also dim P5 = 3. Since /;, h, J; are linearly independent, 
they span P5. 

(c 2) Set 


pix) = (x = Eb Ex) /(& — EE: — &), 
and define p», pa analogously. Clearly 


_ lifi2j 
'ÁPi) = oiriz. 


Ex7. &(x) has to be zero for x = (1,0, —1,2) and x = (2,3. 1. 1). These yield two 
equations for c),...,¢€4! 


Cy — €3 + 2c4 = 0, 2c, + 3c2 + c3 + c4 = 0. 


We express c, and cz 1n terms of c3 and c4. From the first equation, c, = c3 — 2c4. 
Setting this into the second equation gives c» = —c3 + c4. 


CHAPTER 3 
Ex 1. If Ty, = uj. Ty» = uo, then T(y, + yo) = ui + u2, and conversely. 


Ex 2. Suppose we drop the ith equation; if the remaining equations do not 
determine x uniquely, there is an x that is mapped into a vector whose components 
except the ith are zero. If this were true for alli = 1,...,m, the range of the mapping 
x — u would be m-dimensional; but according to Theorem 2, the dimension of the 
range is € n < m. Therefore one of the equations may be dropped without using 


uniqueness; by induction m — n of the equations may be omitted. 


Ex 4. Rotation maps the parallelogram 0, x. v, x + y into another parallelogram 
0, x, y, z; therefore Z =a +y. 
ST maps (1, 0, 0) into (0, 1, 0); TS maps (1, 0, 0) into (0, O, 1). 
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Ex5. Set Tx = u; then (7~'T)x = T^! u = x, and (TT )u = Tx = u. 

Ex6. Part (iD 1s true for all mappings, linear or nonlinear. Part (111) was illustrated 
by Eisenstein as follows: the inverse of putting on your shirt and then your jacket is 
taking off your Jacket and then your shirt. 


Ex 7. ((STM,x) = (I, (ST)’x); 


also 
((STI, x) = (TI, Sx) = (LT Sx), 


from which (ST) = T'S' follows. 
Ex 8. (Tix)-—(lTx)-(T"l,x) for all x; therefore TI = T"1. 
Ex 10. IfM —SKS ', then S 'MS = K, and by Theorem 4, 
S-'M7'S = K™. 
Ex 11. AB = ABAA ' = A(BA)A |, by repeated use of the associative law. 


Ex 13. The even part of an even function is the function itself. 


CHAPTER 4 
Ex 1. (DA); — ` Dikåki = dj Ajj. (AD); — » Ag Dig — Aid). 
Ex 2, In most texts the proof is obscure. 


Ex 4. Choose B so that its range is the nullspace of A, but the range of A is nor the 
nullspace of B. 


CHAPTER 5 
Ex 1. P(pisp2(x)) = a(pispz)P(x). Since pispa(x) = pi(p2(x)), 
P(picp2(x)) = P(pi(p2(x))) = o(pi)P(pa(x)): 


also 


P(p2(x)) = o(p3)P(x). 


Combining these identities yields o(p; - p2) = o(p;)o(p2). 
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Ex 2. (c) The signature of the transposition of two adjacent variables x; and X441 
is —1. The transposition of any two variables can be obtained by composing 
an odd number of interchanges of adjacent variables. The result follows from 
Exercise 1. 
(d) To factor p — lt asa product p = fg... tı of transpositions, set 
H 


Pi «= Pa 
o 12-prn 
i py 2---1---n 
| 2---po---n 
Il, = 
| p;e2.en 
and so on. 
Ex 3. Follows from (7),. 
Ex 4. Gn) When a; = ej.....a, = ej, the only nonzero term on the right side in 


(16) is p — identity. 
(iv) When a; and a; are interchanged, the right side of (16) can be written as 


>. o(top)ap,... . dp, n, 


where ¢ is the transposition of ; and j. The result follows from o(top) = 
a(t)a(p) = —oí(p). 


Ex 5. Suppose two columns of A are equal. Then, by (iv), 


D(a.a) = —D(a,a), 


SO 2D(a, a) — 0. 


CHAPTER 6 
Ex2. (a) All terms in (14) tend to zero. 
(b) Each component of A” is a sum of exponential functions of N, with distinct 


positive exponents. 


Ex 5. (25) is a special case of (26), with g(a) = a". The general case follows by 
combining relations (25) for various values of N. 


Ex 7. For xin Ny 


(A — al) Ax = A(A — dl)“x = 0. 
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Ex8. Let p(s) bea polynomial of degree less than >> d;. Then some a; is a root of p 
of order less than d;. But then p( A) does not map all of Ny, into 0. 


Ex 12. h = (1, —1), h = (1,2), 
(1,44) = 3, (41, 12) = 0, 
(h, hy) zz Li, (15, hz) — 


aJ 


CHAPTER 7 
Ex 1. According to the Schwarz inequality, (x, v) < || x || for all unit vectors y. For 
y = x/|| x || equality holds. 


Ex2. Let Y denote any subspace of X, x and z any pair of vectors in X. Decompose 
them as 


x=yty,  z-uci, 
where y and u are in Y, u^ and ic^ orthogonal to Y; then 


Px zy, Pz = u, 
P orthogonal projection into Y. 


(Px, 2) = (y, u + u^) = (y, u); 
(x, Pz) = (y + yu) = (y, u). 


This proves that P is its own adjoint. 


Ex 3. Reflection across the plane x3 = 0 maps (x1.xX2,x3) into (x1.x2.x3). The 
matrix representing this mapping 1s 


1 O0 O0 

0 1 0 

0 0 -1 

whose determinant is — 1. 

Ex 5. Ifthe rows of M are pairwise orthogonal unit vectors, then according to the 
rules of matrix multiplication, MM" = I. Since a right inverse is a left inverse as 


well, M'M = I; from this it follows from the rules of matrix multiplication that the 
columns of M are pairwise orthogonal unit vectors. 
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Ex 6. aj; = (Ae; e;). By the Schwarz inequality 
aij| € || Ae; |||| ei ||. 


From the definition of norm 


| Ae SIA Ill e; ll 
Since || e; || = || ej || = 1, we deduce that 
lai] € I| A ||- 
Ex 7. Let xj,...,x, be an orthonormal basis for x. Then any x in X can be 


expressed as 


x= jJ» ;Xj. 
and 
2 2 
| x I2 M Mag. 


We can write Ax as 


Ax = 3 a; Ax;, 


SO 
| Ax |] € 5 ' Jal || Ax; | 


Using the classical Schwarz inequality yields 


| Ax I^ € 5 lal $1 Ax IP, 
Irom which 
| Al x Il Ax |? 
follows. 
Apply this inequality to A, — A in place of A to deduce that if (A, — A)x; 


converges to zero for all x;, so does || A, — A ||. 


Ex 8. According to identity (44), 


| x+y ll^ = lx ll^ + 2Re(x, y) + Il y I. 
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Replace y by ty, where t = —Re(x.y)/ || v |. Since the left side is nonnegative, 
we get 


IRe(x. y)| = || x [lll yl) 


Replace x by kx, |k| = 1, and choose k so that the left side is maximized; we obtain 
leyl S I ox Illy Me 


Ex 1d. Forany mapping A, 
det A" = det A. 
For M unitary, M°M = I; by the multiplicative property of determinants, 
det M' det M = detl = 1. 
Using det M* = det M we deduce 


| det M|> = 1. 


&* + = z 2 
Ex17. (AA'),— 5 anay =} ana = 5 lanl; 
È È È 
SO 


trAA* = V (AA*); = Y laa’. 

i n 

Ex 19, For A= (1i). A — 4, det A = 3, so the characteristic equation of A is 
a^ — da 4-3 — 0, 


Its roots are the eigenvalues of A; the larger root is à = 3. 
On the other hand, Y^ [a| = 1 + 449 = 14; V/14 c 3.74, so by (46) and (51) 


3 «|[A || « 3.74. 
For the value of || A || = 3.65, see Ex. 2 in Chapter 8. 


Ex 20. (i) Since det (x, y, z) is a multilinear function of x and y when the other 
variables are held fixed, wí(x. v) is a bilinear function of x and y, 

(ii) follows from det (v. x, z) = — det(x, v, z). 

(iii) is true because det (x, v, x) = 0 and det (x, v. v) — 0. 

(iv) Multiply the matrix (x, y. z) by R: Rix, y. z) = (Rx, Ry, Rz). 
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By the multiplicative property of determinants, and since det R = 1, 
det(x, y, z) = det(Rx, Ry, Rz); 
therefore 
(w(x, y). z) = (w(Rx, Ry), Rz) = (R^w(Rx, Ry, z)). 
from which 
w(x, y) = Rw(Rx, Ry) 


follows. Multiply both sides by R. 
(v) Take xo = a(1, 0, 0)'. yo = b(cos@, sin 6,0)’. 


a beos@ zı 
(xo X yo.z) = det| O bsin z 
Ü 0) a3 

= (absin@)z3 


pa 


Therefore 


Xo X Yo = ab sin (0. 0, 1)'. 
Since a = || xo ||, b = || vo ||. 
| xo x yo || = || xo [||| yo || sin 8. 


Any pair of vectors x, y that make an angle 0 can be rotated into xo, yo; using (iv) 
we deduce 


|x x y lx y || [sing]. 
CHAPTER 8 


Ex 1. (x, Mx) = M*(x,x) = (x, M'x); 


3| — 


Re(x, Mx) — (x, Mx) + La Mx) = qu 
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Ex 4. Multiply (24)' by M on the right; since M*M* = I, 
HM = DM. 


The jth column of the left side is Hj. where m; is the jth column of M. The jth 
column of the right side is d;m;; therefore 


Hm; = dmj. 


Ex 8. Leta be an eigenvalue of M^!H, u an eigenvector: 
M`'Hu = au. 
Multiply both sides on the left by M, and take the inner product with u: 
(Hu, u) = a(Mu.u). 
Since M is positive, 


(Hu,u) — 
(Mu,u) | 


This proves that « is real. 
Ex 10. A normal matrix N has a full set of orthonormal eigenvectors f... Jn: 
Nf; — njf;. 
Any vector x can be expressed a 
=$ af P= laf. 
while 
Nx = » aj. | Nx ||? = » la; Tu in; |" 
50 
|| Nx || < max |nj] || x |], 
with equality holding for x = fm, |n,,| = max |n;|. This proves that 


| N ||2 max [nj]. 
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Ex 11. (b) An eigenvector of S, with eigenvalue v, satisfies 


fii = vf}. J = 2, "o. m sM, fn = vfi. 


SO 


ds 
ve = exp ——k, Kms 
n 


and 


Their scalar product, 


fork Æl. 
Ex 12. (i) A*A = (à HE :) - E Al 
The characteristic equation of A*A is 

A^ — 144 4-9 — 0. 
The larger root is 

Amax = 7 + V 40 ~ 13.224. 
By Theorem 13, 
| A || = Vamax = 3.65. 

(11) This is consistent with the estimate obtained in Ex. 19 of Chapter 7: 

3 <]| A || < 3.74. 


| 2 
Ex 13. (à : 2l 0 3 -( 
aii -l if) 
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The characteristic polynomial of the matrix on the right is 


A* — 154 4-22 — 0, 
whose larger root is 


I5 + V 137 


Ama = ———7— — © 13.35. 


By Theorem 13, 


1 0 -1 - 
IG 3 o ) | Vis 265 


CHAPTER 9 


Ex2. Differentiate A ' A = I using the product rule: 


d 4 pad 
—A ]JA-A —A-0. 
(3 ] i dt 


Solve for SAC; (3) results. 


Ex 3. Denote E 5) as C; C? = I, so C" = C for n odd, = I for n even. 


So 
| 
expC - e(Te 41(1 +54 
e—e! e+e! 
=C 5 TE—— 
-(\* vd 
1.17 1.54 
Ex 6. For Y(t) = exp At, 
d ax 


— — i a perium E 
3 Yh) (exp Ar)A, Y * A 


By formula (10) 


l 
“log det exp Ar = tr A, 
dt 


log detexp Af = t tr A. 
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Thus 


detexp Ar = expli tr A). 


Ex 7. According to Theorem 4 in Chapter 6, for any polynomial p, the 
eigenvalues of p(A) are of the form pia), a an eigenvalue of A. To extend this 
from polynomials to the exponential function we note that e 1s defined as the 
limit of polynomials, e,(s) that are defined by formula (12). To complete the 
proof we apply Theorem 6. 

In Ex. 6 we have shown that det exp A = expí(tr A); this indicates that the 
multiplicity of e^ as an eigenvalue of e^ is the same as the multiplicity of a as an 
eigenvalue of A. 


CHAPTER 10 


Ex1. In formula (6) for vH we may take ,/a; to be either the positive or negative 
square root. This shows that if H has n distinct, nonzero eigenvalues, H has 2" square 
roots, If one of the nonzero eigenvalues of H has multiplicity greater then one, H has 
infinitely many square roots. 

Ex3. A= (55) is positive; it maps (1, 0) into (1, 2). B = (', 2) is positive; it maps 
(1, 0) into { 1, —2). The vectors (1, 2) and (1, —2) make an angle > 7/2, so AB + BA 
is not positive. Indeed, 


has one negative eigenvalue. 


Ex4. (a) Apply Theorem 5 twice, 
(b) Apply Theorem 5 k times, 


where 2^ = m. 
(c) The limit 
m[M'/" -1« m[N'/" -1| 
gives 


log M = log N. 
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Note. (b) remains true for all positive exponents m > I. 


Ex 5. Choose A and B as in Exercise 3, that is positive matrices whose 
symmetrized product is not positive. Set 


M=A, N—-A-tB, 
t a sufficiently small positive number. Clearly, M < N. 
N? = A? + (AB + BA) + P B?; 


for ? small the term 7? B is negligible compared with the linear term. Therefore for 1 
Y a " 
small N~ is not greater than M^. 


Ex 6. We claim that the functions f(s) = —(s + D. ! positive, are monotone 
matrix functions. For if 0 x M < N, then 0<M+4s1< N-- il, and so by 
Theorem 2, 

(Mam)! > (Nd). 
The function f(s) defined by (19) is the limit of linear combinations with positive 
coefficents of functions of the form s and —(s + t) ' t > 0. The linear combinations 
of monotone matrix functions is monotone, and so are their limits. 

Note I. Loewner also proved the converse of the theorem stated: Every monotone 
matrix function is of form (19). 

Note 2. Every function f(s) of form (19) can be extended to an analytic function 
into the complex upper half plane Im s > 0, so that the imaginary part of f(s) is 
positive there, and zero on the positive real axis s > 0. According to a theorem of 
Herglotz, all such functions f(s) can be represented in the form (19). 

It is easy to verify that the functions 5",0 < m < 1, and the function log s have 
positive imaginary parts in the upper half plane. 


Ex 7. The matrix 


G; = ——r,.r > 0, 
" rra | 


is a Gram matrix: 


| 
TTE J filtyf(ae, f(t) = 1” 


Ex 10. By the Schwarz inequality, and the definition of the norm of M — N, 


(u, (M — N)u) < || u || || (M — N)w || 


<|| «II M-N |= 4 || u ||? 
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Therefore 
(u, Mu) < (u, Nu) +d || u || = (u, (N + dDu). 
This proves that M < N + dI: the other inequality follows by inter changing the role 
of M and N. 
Ex 11. Arrange the m; in increasing order: 
mi «coc xm mp. 


Suppose the n; are not in increasing order, that is that for a pair of indices i < J, 
n; > nj. We claim that interchanging n; and n; increases the sum (51): 


nim; + njm; € nym; + nim. 
For rewrite this inequality as 


(n; — nj)m; + (ny — nimi 


= (n; — nj)(m; — m;) < 0, 


which is manifestly true. A finite number of interchanges shows that (51) is 
maximized when the n; and m; are arranged in the same order. 


Ex12. If Z were not invertible, it would have zero as an eigenvalue, contradicting 


Theorem 20. 
Let h be any vector: denote Z^! by k. Then 


(Z !h,h) = (k, Zk); 


Since the self-adjoint part of Z is positive, the right side above is positive. But then 
so is the left side, which proves that the self-adjoint part of Z ' is positive. 


Ex 13. When A is invertible. AA" and A*A are similar: 
A*A=A IAA'A, 


and therefore have the same eigenvalues. Noninvertible A can be obtained as the 
limit of a sequence of invertible matrices. 


Ex 14. Let u be an eigenvector of A" A, with nonzero eigenvalue: 


A'Au-ru, r#Q. 
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Denote Au as v; the vector v is nonzero, for if Au = O, it follows from the above 


relation that u = 0. 
Let A act on the above relation: 


AA’ Au = rA 


which can be rewritten as 


AA"y = rv; 
which shows that v is an eigenvector of AA’, with eigenvalue x 
A maps the eigenspace of A*A with eigenvalue r into the eigenspace of AA"; this 
mapping is 1-to-1. Similarly A" maps the eigenspace of AA” into the eigenspace of 
AČA in a l-to-1] fashion. This proves that these eigenspaces have the same 


dimension. 


Ex 15. Take Z — (| S$), a some real number; its eigenvalues are 1 and 2. But 


0 2 
] 2 a 
ZZ -(3 A 


is not positive when a > v/$. 


CHAPTER 11 
Ex 1. IfM; = AM, M; = M'A' = —MA. 


Then 
(M'M), = MiM + MM, = -M'AM + M'AM = O. 


since MM-—Iatt—0, MM- I for all t. At t= 0, det M = 1, therefore det 
M = | for all t. This proves that M(t) is a rotation. 


Ex 5. The nonzero eigenvalues of a real antisymmetric matrix A are pure imaginary 
and come in conjugate pairs ik and —ik. The eigenvalues of A^ are 0, —&?, —k*, so 
tr A^ =—2k*, The diagonal entries of A^ are —(a? +b6*),—(a? +c?) and 
— (D? + c*), so tr A? = —2(a? + b? + c?). Therefore k = va? + b? + c, 


Ex 6. The eigenvalues of e™ are e^, where a are the eigenvalues of A. Since the 
eigenvalues of A are 0, tik, the eigenvalues of e“ are 1 and e^. From Af = 0 we 
deduce that e^'f = f; thus f is the axis of the rotation e™. The trace of e™ is 
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| + e +e = 2 cos kt + 1. According to formula (4)', the angle of rotation 0 of 
M = e™ satisfies 2cos0 + 1 = tr e^. This shows that 0 = kt = Va? + b? + ct. 


0 a b 0 d e 
Ex8. A= | -a 0 c], B=|-d 0 g 
-b —c 0 —e —g 0 


their null vectors are 


—c —g 
fa = bi,fs=| e 
—ü —d 
ad + be be —ag 
AB = — ce ad + cg ae 
—cd bd be + cg 


Therefore tr AB = —2(ad + be + cg), whereas the scalar product of fa and fp is 
cg + be + ad. 


Ex 9. BA can be calculated like AB, given above. Subtracting we get 


0 ec —bg —dc-rag 


AB — BA = | bg — ec 0 db — ae 
dc —ag ae — db 0 
Therefore 
ae — db 
fip] = | ag — dc 
bg — ec 


We can verify that fA X fp — fjas, by using the formula for the cross product in 
Chapter 7. 


CHAPTER 12 


Ex2. (a) Let ( K;) be a collection of convex sets, denote their intersection by K. If x 
and y belong to K, they belong to every K; Since K; is convex, it contains the line 
segment with endpoints x and y. Since this line segment belongs to all K;, it belongs 
to K. This shows that K is convex. 

(b) Let x and y be two points in H + K; that means that they are of the form 


x=u+z, y=ve+w, u and v in A, zand win K. 
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Since H is convex, au --(1-— a)v belongs to H, and since K is convex 
a Z+ (1 — a)w belongs to H for 0 € a < I. But then their sum 


au+(l—a)v+az+(l—a)w=a(u+z)4+ (1 —a)(v+w) = ax - (1 — a)y 
belongs to H + K. This proves that H + K is convex. 
Ex 6. Denote (u,v) as x. If both u and v are < 0, then x/r belongs to K for any 


positive r, no matter how small. So p(x) = 0. 
If 0 < v and u € v, then x/r = (2.5) belongs to K for r > v, but no smaller + 


i 


Therefore p(x) = v. We can argue similarly in the remaining case. 


Ex 7. If p(x) « 1l, p(y) «x 1, and O<a<1, then by sub-additivity and 
homogeneity of p, 


p(ax + (1 — a)y) € p(ax) + p((1 — a)y) = ap(x) + (1 — a)p(v) < 1. 
This shows that the set of x : p(x) < 1 is convex. 
To show that the set p(x) < 1 is open we argue as follows. By subadditivity and 
positive homogeneity 


p(x + ty) € p(x) + p(ty) = p(x) + tpy). 


Since p(x) < 1. p(x) + tp(v) < 1 for all ! positive but small enough. 


Ex 8. 
qs(im +l) = sup(m + (x) 
xin 5 
= sup(m(x) + /(x)) < sup m(x) + sup k(x) = gqs(m) + gs(I). 
xin 5 xin xing 


Note. This is a special case of the result that the supremum of linear functions is 
subadditive. 


Ex 10. 
sur = sup © 
xin SL T 
= max (sup 9, sup 9) = max(qs(1), ar(0)}. 
xing xin T 


Ex 16. Suppose all p; are positive. Define 


k k 
yk = X pjxj/ » p; 
| | 
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Then 


. Pi 
— Me (do PEE D T 


Y» En 


We claim that all points y; belong to the convex set containing x;,...,x,. This 
follows inductively, since y; = x1, and yj. 4 lie on the line segment whose endpoints 
are y, and x,..,. Finally, 


rm 


Ym = »  Djsj- 
l 


Ex 19. Denote by Pi, Pa, P3 the following 3 x 3 permutation matrices 


| 0 0 0 | O0 0 0 I 
Peat? 1 0|, R=10 0 1]. R=|1 00 
0 0 ] 1 0 0 0 ] O0 
Then 
D. d jp? o4 
-Pi +t- +- =- l | l = M 
3 3 3 
i 1 |l 
Similarly define 
1 Q 0 U X 0 0 0 I 
P= 10 0 I J,P5—1|1 O UO j,Pg—- [O0 10 
0 | O0 0 0 | | 0 0 
Then 
| p pip ÍP =M 
"id ud E 


Ex 20. A set § in Euclidean space is open if for every point x in 5 there is a ball 
| y — x || € € centered at x that belongs to S. Suppose S is convex and open; that 
means that there exist positive numbers e; such that x + te; belongs to 5 for |t| < €; 
here e; denotes the unit vectors. Denote min e; by c; it follows that the points 
eie oat oe = 1l, n belong to S. Since § is convex, the convex hull of these points 


belongs to $; this convex hull contains a ball of radius c/4/2 centered at x. 
The converse is obvious. 
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CHAPTER 13 
Ex 3. We claim that the sign of equality holds in (21). For if not, 5 would be 
larger than s, contrary to (20); this shows that the supremum in (16) is a 
maximum. 

Replacing Y, y, and j by — Y, —y, —j turns the sup problem into inf, and vice versa. 
This shows that the /n f in (18) is a minimum. 
CHAPTER 14 
Ex2. x—z= (x —y)}+(y — z); apply the subadditive rule (1)... 


Ex 5. From the definition of the |x|, and |x|, norms we see that 
xl, S Ix] S nisl. 


Take the pth root: 


|/ 
loo S Ix] S n "Rl 


Since n'/? tends to 1 as p tends to oc, |x|, = lim |x|, follows. 


px 
Ex6. Introduce a basis and represent the points by arrays of real numbers. Since all 
norms are equivalent, it suffices to prove completeness in the |x| norm. 
Let 1x, ; be a convergent sequence in the |x| norm. Denote the components of x, 
by x, ;. It follows from |x,, — — (0 that 


Xm [max 
[Anj — Xmn.j | — 0 
for every j. Since the real numbers are complete, it follows that 


lim Xaj = xj. 


nox 
Denote by x the vector with components .x;; it follows that 


lim |x, — x|,. = 0. 


For another proof see Theorem 3. 
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CHAPTER 15 


Ex 1. According to (1), 


Tx| € c|x|. Apply this to |T(x, — x)| € clx, — xl. 


Ex 3. We have shown above that 

(I— R)T” ZI- R"*' 
Multiply both sides on the left by S^: 

T" —S-! _ gip 
Therefore 

T” -S^'| < [S 'R"*'| < |S "|[R"*'|. 
Since |R| < 1, |R"*!| < |R|"*' tends to zero as n — cc, 
Ex 5. Decompose n modulo m: 
n — km 4 r, 0 «€ r «€ m. 

Then 


R” ai pentr = (R"YR', 
therefore 
IR"| < IR™ "IR" 


as n tends to oc, so does k; therefore if |R"| < 1, |R"| tends to zero as n tends to oo. 
That is all that was used in the proof in Ex. 3, that T" tends in norm to S !. 


Ex 6. The components of y — Tx are 


y; = ) [Xj 


J 


Since |xj| € |x|,.. 
vd € » | Mal loo: 
j 


So [yla € max 5 /;|t;j]|X|,.: (23) follows 
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CHAPTER 16 


Ex1. What has to be shown is that if Px < Ax, then A > A(P). To see this consider 
P’ , the transpose of P: it, too, has positive entries, so by Theorem | it has a dominant 
eigenvalue A(P' ) and a corresponding nonnegative eigenvector k: 


P'k = A(P'y. 
Take the scalar product of Px < Ax with k; since k is a nonnegative vector 

(Px, k) € A(x,k) 
The left side equals (x. P7k) = A(P')(x. k}. Since x and k are nonnegative vectors, 
(x, k) is positive, and we conclude that A(P^) < 4. But P and P” have the same 
eigenvalues, and therefore the largest eigenvalue A(P) of P equals the largest 


eigenvalue A(P') of P’, 


Ex 2. Denote by jz the dominant eigenvalue of P", and by k the associated 
eigenvector: 


F"k- uk. 
Let P act on this relation: 
p"*'k = P"Pk = Pk, 


which shows that Pk, too, is an eigenvector of P" with eigenvalue u. Since the 
dominant eigenvalue has multiplicity one, Pk = ck. Repeated application of P shows 
that P"& = ck. Therefore c" = u. From Pk = ck we deduce that c is real and 
positive; therefore it is the real root j4!/", Since the entries of k are positive, 
c = p ™ is the dominant eigenvalue of P. 
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APPENDIX | 


Special Determinants 


There are some classes of matrices whose determinants can be expressed by compact 
algebraic formulas. We give some interesting examples. 


Definition. A Vandermonde matrix is a square matrix whose columns form a 
geometric progression. That is, let a;....,a, be n scalars; then V(a),...,a@,) 1s the 
matrix 


ay ily 
Viaj.---; iy) = ; (1) 
ar an 
Theorem 1 
det Vian,- san) = [] (aj ^ a). (2) 
j>i 


Proof. Using formula (16) of Chapter 5 for the determinant, we conclude that 
det V is a polynomial in the a; of degree less than or equal to n(n — 1)/2. Whenever 
two of the scalars a; and aj, i Æj, are equal, V has two equal columns and so its 
determinant is zero; therefore, according to the factor theorem of algebra, det V i5 


divisible by a; — a). It follows that det V is divisible by the product 


][t« - 2. 


Fr] 
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This product has degree n(n — 1)/2, the same as the degree of det V. Therefore 


det V = c, | [ (a; - ai), (2)' 


jot 


c, à constant. We claim that cp = 1; to see this we use the Laplace expansion (26) 
of Chapter 5 for det V with respect to the last column, that is, j = n. We get in this 
way an expansion of detV in powers of a,; the coefficient of a"^! is 
det V(aj,..., d, ). On the other hand, the coefficient of gt on the right of (2)' 
is c,IL,.;.; (a; — aj). Using expression (2) for V(aj,....a, 1). we deduce that 


Cy = Cn-1. An explicit calculation shows that c» = 1; hence by induction c, = | for 
all n, and (2) follows. O 


Definition. Let aj,...,a, and bi,...,bn be 2n scalars. The Cauchy matrix 
C(ai, ...,ayi bi, ..., ba) is the n x n matrix whose ijth element is 1/(a; + aj): 


| 
C(a,b) = ( ri z) 


Theorem 2. 


ILs (aj — a;)(b; — bi) 


det C(a. b) = TT. (a; 4 bj) 
ij Mi F 9j 


(3) 


Proof. Using formula (16) of Chapter 5 for the determinant of C(a, b), and using 
the common denominator for all terms we can write 


P(a, b) 


det C(a, b) = ——————., 


(4) 


where P(a,b) is a polynomial whose degree is less than or equal to z^ — n. 
Whenever two of the scalars a; and a; are equal, the ith and jth row of C(a, b) are 
equal; likewise, when 5; = b; the ith and jth column of C(a, b) are equal. In either 
case, det C(a. b) = 0; therefore, by the factor theorem of algebra, the polynomial 
P(a. b) is divisible by (a; — a;) and by (b; — b;), and therefore by the product 


I (Gu - a)(b; - bi). 


joi 


The degree of this product is n? — n, the same as the degree of P; therefore, 


P(a, b) = c, | [ía — ai)(b; — bi), (4)' 


joi 
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c, à constant. We claim that c, — 1; to see this we use the Laplace expansion for 
Cía, b) with respect to the last column, j = n; the term corresponding to the element 
1 /(a + Oy) is 


l 


det C(ai, ... d$ 4: b, 3 doble EC 


Now set a, — b, — d; we get from (4) and (4)' that 


2d LL... (d + a;)(d = b;) [is joc + bj) 
From the Laplace expansion we get 


Cí(a,..., d b,...,d) 


Clai,..., 454:D31,...,5,), we deduce that c, — c,.,. An explicit calculation 
shows that c; = 1, so we conclude by induction that c, = 1 for all n; (3) now follows 
from (4) and (4)'. [] 


Note: Every minor of a Cauchy matrix is a Cauchy matrix. 


EXERCISE T. Let 


p(s) =x) 4- x25 xn 


be a polynomial of degree less than n. Let aj,...,a, be n distinct numbers, and let 
Pis: Pn be n arbitrary complex numbers; we wish to choose the coefficients 
Xp, sy dy 80 that 

plaj) = pi. f= 1;...;n. 
This is a system of n linear equations for the n coefficients x;. Find the matrix of this 
system of equations, and show that its determinant is # 0. 


ExERCISE 2. Find an algebraic formula for the determinant of the matrix whose 
ijth element is 


| 
1 + djdj 


here d;..... a, are arbitrary scalars. 


APPENDIX 2 


The Pfaffian 


Let A be an n x n antisymmetric matrix: 
A! = —A. 


We have seen in Chapter 5 that a matrix and its transpose have the same determinant. 
We have also seen that the determinant of —A is (—1)" det A so 


det A = det A? = det(—A) = (—1)" det A. 


When 7 is odd, it follows that det A = 0; what can we say about the even case? 

Suppose the entries of A are real; then the eigenvalues come in complex 
conjugate pairs. On the other hand, according to the spectral theory of anti-self- 
adjoint matrices, the eigenvalues of A are purely imaginary. It follows that the 
eigenvalues of A are (—iA1,..., —iX,/5, I1... . , £X,/5). Their product is (IIA;)^, a 
positive number; since the determinant of a matrix is the product of its eigenvalues, 
we conclude that the determinant of an antisymmetric matrix of even order with real 
entries is nonnegative. 

Far more is true: 


Theorem of Cayley. The determinant of an antisymmetric matrix A of 
even order is the square of a homogeneous polynomial of degree n/2 in the 
entries of A: 


det A = P7. 


P is called the Pfaffian. 
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EXERCISE I. Verify by a calculation Cayley’s theorem for n = 4. 
Proof. The proof is based on the following lemma. LJ 


Lemma 1. There is a matrix C whose entries are polynomials in the entries of 
the antisymmetric matrix A such that 


B = CAC’ (1) 


is antisymmetric and tridiagonal, that is, b; = 0 for |i — j| > 1. Furthermore, 
det C # 0. 


Proof. We construct C as a product 


C-C, CC. 
C, is required to have the following properties: 


(i) B, = C, ACT has zeros for the last (n — 2) entries in its first column. 


(ii) The first row of C, is e; = (1.0,...,0), its first column is ef. 


It follows from (ii) that C; maps ef into ej ; therefore the first column of B1, Biej, 
is C,AC; ef = Ael = Cia, where a denotes the first column of A. To satisfy (1) 
we have to choose the rest of C; so that the last (n — 2) entries of Cja are zero. This 
requires the last # — 2 rows of C; to be orthogonal to a. This is easily accomplished: 
set the second row of C, equal to e» = (0,1,0,...,0) the third row (0, a3, ~az, 
0,...,0), the fourth row (0,0, a4, —a3,0,...,0), and soon, where a,....,a, are the 
entries of the vector a. Clearly 


det C, = (1503 * * * Gy] 


is a nonzero polynomial. 

We proceed recursively; we construct C» so its first two rows are e; and e», its first 
two columns e? and e. Then the first column of Bz = C3B,C7 has zero for its last 
n — 2 entires. As before, we fill in the rest of C» so that the second column of B» has 
zeros for its last n — 3 entries. Clearly, det C» is a nonzero polynomial. 

After (n — 2) steps we end with C = C, 5---C,, having the property that 
B = CAC‘ has zero entries below the first subdiagonal, that is, b; = 0 for i > j + 1. 
B? = CA’C! = —B, that is, B is antisymmetric. It follows that its only nonzero 
entries lie on the sub and super diagonals j = i + 1. Since B = -B,b;;_) = —b; izi. 
Furthermore, by construction, 


detC = | [ detC; 4 0. [1 (2) 
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What is the determinant of an antisymmetric, tridiagonal matrix of even order? 
Consider the 4 x 4 case 


Q a 0 OQ 

—ü 0 b 0 

B= 0 -b Oe 
Ü 0 -c 0 


+ 7 J a " + , 
Its determinant det B = a^c^ is the square of a single product. The same is true in 
general: the determinant of a tridiagonal antisymmetric matrix B of even order is the 
square of a single product, 


2 
det B = (IT) . (3) 


Using the multiplicative property of determinants, and that det C^ = det C, we 
deduce from (1) that 


det B = (det C)" det A; 


combining this with (2) and (3) we deduce that det A is the square of a rational 
function in the entries of A. To conclude we need therefore Lemma 2. 


Lemma 2. If a polynomial P in n variables is the square of a rational function 
R, R is a polynomial. 


Proof. For functions of one variable this follows by elementary algebra; so we 
can conclude that for each fixed variable x, R is a polynomial in x, with coefficients 
Irom the field of rational functions in the remaining variables. It follows that there 
exists a k such that the Ath partial derivative of R with respect to any of the variables 
is zero. From this it is easy to deduce, by induction on the number of variables, that R 
is a polynomial in all variables. Ll 


J08 VII. Lattices and Convex Bodies 


Applying Lemma 6.1, we finally conclude that 
f | At, n (Ag + yil duy = vol 1 My 
n 


and hence (6.2.3) follows. Now we observe that 


1 
d My Ar| ) dz =a Y^ vola i My <1 
LH | (Em A |) dz a> volg- My € 


kx ka 


by (6.2.2). Therefore, the average value of the number of points in (M V My) N Ax is 
strictly smaller than 1. Therefore, there must be an x € IL such that the intersection 
(M X Ma) ' A, is empty. By (6.2.1) we conclude that MoO A, consists of at most 
the zero vector, which completes the proof, E 


PROBLEMS. 


1. Let à be à Lebesgue integrable function on R7 and let A C R^ be a lattice, 
Prove that there exists a 2 € R^ such that 


u+) e l^ jd 
b ld det A Po pops 


we A 


2. Let ó be a bounded Riemann integrable function vanishing outside a bounded 
region in Ri, d > 1, and let c bea positive number, Prove that there exists a uni- 
modular lattice A c R^ such that 


2. diu) «€ « «f eir) dr. 
Ri 


we A\ {0} 


3. Let M c R^, d > 1, be a bounded centrally symmetric Jordan measurable 
set and let 6 > (1/2)(vol M). Prove that there exists a lattice A C R^ such that 
det A = 6 and M does not contain any lattice point, except possibly U. 


Hint: Either M does not contain any non-zero lattice point or it contains at 
least two. 


(6.3) Corollary. For any o < 27^ there exists a d-dimensional lattice A C R^ 
whose packing density is at least a. Similarly, for any 


TU 4(d/2 4 1) i l 
es Ves (io) 


there exists a d-dimensional lattice A C R^ with packing radius at least p and 
determinant 1. 
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for all x and v, satisfies 

s!JS = J (4) 
and conversely. 


Proof. (Sx. JSy) = (x, S'JSy). If this is equal to (x, Jy) for all x, y, S'SJy = J, 
for all y. CJ 


A real matrix S that satisfies (4) is called a symplectic matrix. The set of all 
symplectic matrices is denoted as Sp(n). [] 


Theorem 2. (i) Symplectic matrices form a group under matrix multiplication. 
(ii) If S is symplectic, so is its transpose S’. 

(iii) A symplectic matrix S similar to its inverse S~! 

Proof. (1) It follows from (4) that every symplectic matrix is invertible. That 


they form a group follows from (3). To verify (ii), take the inverse of (4): using (2) 
we get 


sys")! =J. 
Multiplying by S on the left, S^ on the right shows that S" satisfies (4). 
To deduce (iii) multiply (4) by S^! on the right and J^! on the left. We get that 
J~'S’J = S^, that is, that S^! is similar to S^. Since S^ is similar to S, (iii) 


follows. C 


Theorem 3. Let S(i) be a differentiable function of the real variable 7, whose 
values are symplectic matrices. Define G(r) by 


d 
xS - OS. (S) 


Then G is of the form 


G 


| 
t= 
m 
Quim, 
— 
i m 


L self-adjoint. (6) 


Conversely, if S(1) satisfies (5) and (6) and S(0) is symplectic, then S(r) is a family of 
symplectic matrices. 


Proof. For each f (4) is satisfied; differentiate it with respect to t: 


d d 
—S? |JS 4- SJ —-S = 0. 
di di 
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Multiply by 87! on the right, (S") ' on the left: 
dor. d 
T4—-1 T , -1 
- m -+- J| 5]: = Ü. 7 
(5) a (<s)s (7) 


We use (5) to define G: 


Taking the transpose we get 


G = (dy tg 
Setting these into (7) gives 
G'J - JG — 0, 
from which (6) follows, [] 


EXERCISE 2. Prove the converse. 


We turn now to the spectrum of a symplectic matrix S. Since 5S is real, its complex 
eigenvalues come in conjugate pairs, that is, if A is an eigenvalue, so is A. According 
to part (111) of Theorem 2, 5 and $^ ! are similar; since similar matrices have the same 
spectrum, it follows that if A is an eigenvalue of S, so is A^! and it has the same 
multiplicity. Thus the eigenvalues of a symplectic matrix S come in groups of four: 
A, À, A7! À71, with three exceptions: 


(a) When A lies on the unit circle, that is, |A| = 1, then A^! = A, so we only 
have a group of two, 

(b) When A is real, A = A, so we only have a group of two. 

(c) A= I or —1. 


The possibility is still open that A = +1 are simple eigenvalues of 5; but this 
cannot occur according to 


Theorem 4. For a symplectic matrix S, A=] or —1 cannot be a simple 
eigenvalue. 


Proof. We argue indirectly: suppose, say, that A = —1 is a simple eigenvalue, 
with eigenvector /r 


Sh = —h. (8) 
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Multiplying both sides by S'J and using (4) we get 
Jh = —S' Jh, (8)' 
which shows that J/ is eigenvector of S^ with eigenvalue — 1. 
We choose any self-adjoint, positive matrix L, and set G = JL. We define the one- 


parameter family of matrices S(r) as eS; it satisfies 


d 
7 S(t) =GS(1), S0) =S. (9) 


According to Theorem 3, S(r) is symplectic for all z. 
If S(O) has —1 as eigenvalue of multiplicity one, then for ¢ small, S(t) has a single 
eigenvalue near —1. This eigenvalue \ equals —1, for otherwise A^! would be 


another eigenvalue near —1. According to Theorem 8 of Chapter 9, the eigenvector 
h(t) is a differentiable function of t. Differentiating Sh = —/ yields 


d d 
(Zs) h+ Sh, = —h,. h, = P 
Using (9) and (8) we get 

Gh = hi + Shy. 


Form the scalar product with Jh; using (8)' we get 


Gh, JA) = (h, Jh) + (Sh, Jh) = (hy. Jh) + (hj, S" JR 
( 


= (h, Jh) — (h,, Jh) = 0. (0) 

According to (6), G — JL; set this into (10); since by (2), JJ = I. we have 
(JLA, Jh) = (Lh,J' JA) = (Lh, ^h) = 0. (10) 
Since L was chosen to be self-adjoint and positive, /; = 0, a contradiction. CI 


EXERCISE 3. Prove that plus or minus | cannot be an eigenvalue of odd 
multiplicity of a symplectic matrix. 


laking the determinant of (4), using the multiplicative property, and that 
det S! = detS we deduce that (det S) = | so that detS — ] or —1. More is 
true. 
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Theorem 5. The determinant of a symplectic matrix 5 is 1. 

Proof. Since we already know that (det S) = 1, we only have to exclude the 
possibility that detS is negative. The determinant of a matrix is the product of its 
eigenvalues. The complex eigenvalues come in conjugate pairs; their product is 
positive. The real eigenvalues 4 1, —1 come in pairs À, A^. and their product is 
positive. According to Exercise 3, —1 is an eigenvalue of even multiplicity; so the 
product of the eigenvalues is positive. O 


We remark that it can be shown that the space Sp(n) of symplectic matrices is 
connected. Since (det S)" = 1 and since S =I has determinant 1, it follows that 
det S — I for all S in Sp(n). 

Symplectic matrices first appeared in Hamiltonian mechanics, governed by 
equations of the form 


d 
at = Bs (11) 


" = "7 n , k ^" a a * 
where u(t) lies in R”, H is some smooth function in R”, and H, is its gradient. 


Definition. A nonlinear mapping u — v is called a canonical transformation if 
its Jacobian matrix Ov/Ou is symplectic, 


Theorem 6. A canonical transformation changes every Hamiltonian equation 
(11) into another equation of Hamiltonian form: 


d IK 
— 9 — T 
dt 


where K(v(u)) — H(u). 


EXERCISE 4. Verify Theorem 6. 


APPENDIX 4 


Tensor Product 


For an analyst, a good way to think of the tensor product of two linear spaces is to 
take one space as the space of polynomials in x of degree less than m, the other as 
the polynomials in y of degree less than m. Their tensor product is the space of 
polynomials in x and y, of degree less than 7 in x, less than m in y. A natural basis for 
polynomials are the powers 1,x,...,x" ! and 1, v..... y" !, respectively: a natural 
basis for polynomials in x and v is x y, i <n, j « m. 

This sets the stage for defining the tensor product of two linear spaces U and Vas 
follows: Let {e;} be a basis of the linear space U, {f;} a basis for the linear space V. 
Then 1e; © fj} is a basis for their tensor product U c V. 

It follows from this definition that 


dim U & V = (dim U)(dim V). (1) 
The definition, however, 1s ugly, since it uses basis vectors. 


EXERCISE I. Establish a natural isomorphism between tensor products defined 
with respect to two pairs of distinct bases. 


Happily, we can define U & V in an invariant manner. 
Take the collection of all formal sums 


where iu; and v; are arbitrary vectors in U and V, respectively. Clearly, these sums 
form a linear space. 
Sums of the form 


(4 +i)Qv—u @v—weyv (3) 
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and 


uG(vic-v)—u&v—u& vy Gy 


are special cases of (2). These, and all their linear combinations, are called mull sums. 
Using these concepts we can give a basis-free definition of tensor product. 


Definition. The tensor product U V of two finite-dimensional linear spaces U 
and V is the quotient space of the space of all formal sums (2) modulo all null sums 
(3), (3. 


This definition 15 basis-free, but à little awkward. Happily, there is an elegant way 
of presenting it. 


Theorem 1. There is a natural isomorphism between U © V as defined above 
and “(U", V). the space of all linear mappings of U" into V, where U" is the dual of U. 


Proof. Let La, & v; be a representative of an equivalence class in the quotient 
space. For any / in U“, assign to / the image X/(u;)v; in V. Since every null sum is 
mapped into zero, this mapping depends only on the equivalence class. 

The mapping L, 


1 — M Nuj)v;, 


is clearly linear and the assignment 


l > T n} =L (4) 


also is linear. It is not hard to show that every L in 4 (U", V) is the image of some 
vector in U & V. L] 


EXERCISE 2. Verify that (4) maps U @ V onto #(U", V). 


Theorem | treats the spaces U and V asymmetrically, The roles of U and V can be 
interchanged, leading to an isomorphism of U © V and JA (V, U). The dual of a map 
L: U' — V is of course a map L: V' — U. 

When U and Vare equipped with real Euclidean structure, there is a natural way 
to equip U & V with Euclidean structure. As before, there are two ways of going 
about it. One is to choose orthonormal bases {e;}, [fj] in U and V respectively, and 
declare (e; © fj) to be an orthonormal basis for U @ V. It remains to be shown that 
this Euclidean structure is independent of the choice of the orthonormal bases; this is 
easily done, based on the following lemma. 
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Lemma 2. Let m, z be a pair of vectors in U, v, w a pair of vectors in V. Then 
(uara w] = (iu ziv w). (5) 


Proof. Expand n and z in terms of the ej. v and w in terms of fj. 


Then 
uv y djc;e; & fr. zgw- M bydyey & fr; 
SO 


(ug nE w) = x mybe; 


= (Zav) (Dod i) = (u, z)(v,w). Ci 


Take the example presented at the beginning, where V is the space of polynomials 
in x of degree < n, and V is the space of polynomials in y of degree <m. Define in U 
the square integral over an x-interval A as the Euclidean norm, and in V the square 
integral over a y-interval B. Then the Euclidean structure in U & V defined by (5) is 
the square integral over the rectangle A x B. 

We show now how to use the representation of U @ V as SP (U", V) to derive the 
Euclidean structure in U c V from the Euclidean structure of U and V. Here 
U' =U, so U @ V is #(U, V). 

Let M and L belong to (U, V). and let L' be the adjoint of L. We define 


(M,L) = trL'M. (6) 


Clearly this depends bilinearly on M and L. In terms of orthonormal bases, M and L 
can be expressed as matrices (my) and (ly), and L* as the transpose (ly). Then 


trL'M = 5 lmj. 
Setting L = M, we get 
| M ||? = (M.M) = $ mg, 


consistent with our previous definition. 
Complex Euclidean structures can be handled the same way. 
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All of the foregoing is pretty dull stuff. To liven it up, here is a one-line proof of 
Schur's peculiar theorem from Chapter 10, Theorem 7: if A = (Aj) and B = (Bj) 
are positive symmetric n x m matrices then so is their entry by entry product 
M = (AjB;). 


Proof. It was observed in Theorem 6 of Chapter 10 that every positive symmetric 
matrix can be written as a Gram matrix: 


Ag = (uj, uj), u; C U, linearly independent, 
B; = (vi vj) v, C V, linearly independent. 


Now define g; in U & V to be u; & vj; by (5), (gi, gj) = (uj, ag) (vi. vj) = AgBj. 
This shows that M is a Gram matrix, therefore nonnegative. 


EXERCISE 3. Show that if {u;} and [v;) are linearly independent, so are u; © vj. 
Show that Mj; is positive, 


EXERCISE 4. Let u be a twice differentiable function of x,.....x, defined in a 
neighborhood of a point p, where u has a local minimum. Let (Aj) be a symmetric, 
nonnegative matrix. Show that 


rp 


Gu 
Noii: dm > ü. 
Luga P’ 


APPENDIX 5 


Lattices 


Definition. A lattice is a subset L of a linear space X over the reals with the 
lollowing properties: 


(i) Lis closed under addition and subtraction; that is, if x and y belong to L, so 
do x 4- y and x — y. 

(ii) Ls discrete, in the sense that any bounded (as measured in any norm) set of 
X contains only a finite number of points of L. 


An example of a lattice in R” is the collection of points x = (xj,...,x,) with 
integer components x;. The basic theorem of the subject says that this example is 
typical. 


Theorem 1. Every lattice has an integer basis, that is, a collection of vectors in 
L such that every vector in the lattice can be expressed uniquely as a linear 
combination of basis vectors with integer coefficients. 


Proof. The dimension of a lattice L is the dimension of the linear space it spans. 


Let L be k-dimensional, and let p;,.... p, be a basis in Z for the span of L; that is, 
every vector r in L can be expressed uniquely as 


t= E AiP: a; real. (1) 


Consider now the subset of those vectors t in L which are of form (1) with a; between 
0 and I: 


Deémsi geld uk (2) 
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This set is not empty, for its contains all vectors with a; = O or 1. Since L is discrete, 
there are only a finite number of vectors ¢ in L of this form: denote by qı that vector t 
of form (1), (2) for which a, is positive and as small as possible. LJ 


EXERCISE I. Show that a; is a rational number. 


Now replace p; by qı in the basis; every vector tin L can be expressed uniquely as 


k 
i= bigi + ` bipj. b; real. (3) 
2 


We claim that b; occurring in (3) is an integer: for if not, we can subtract a suitable 
integer multiple of gı from r so that the coefficient b; of q lies strictly between 
0 and |: 


0 « b, < I. 
If then we substitute into (3) the representation (1) of q, in terms of p;,.... p; and 
add or subtract suitable integer multiples of po.....p,, we find that the pi 


coefficient of f is positive and /ess than the p; coefficient of qi. This contradicts our 
choice of qı. 

We complete our proof by an induction on k, the dimension of the lattice. 
Denote by Ly the subset of L consisting of those vectors t in L whose representation 
of form (3), 5, is zero. Clearly Lo is a sublattice of L of dimension k — I: by 
induction hypothesis Lọ has an integer basis q»..... qx. By (3), qi,...,q, is an 
integer basis of L. LI 


An integer basis is far from unique as is shown in the following theorem. 


Theorem 2. Let L be an n-dimensional lattice in R". Let q;,....q, and 


rj... fa be two integer bases of L; denote by Q and R the matrices whose columns 
are qj and r;, respectively. Then 


Q — MR, 


where M is a unimodular matrix, that is, a matrix with integer entries whose 
determinant is plus or minus 1. 


EXERCISE 2. (i) Prove Theorem 2. 
(ii) Show that unimodular matrices form a group under multiplication. 


Definition. Let L be a lattice in a linear space X. The dual of L, denoted as L’, is 
the subset of the dual X' of X consisting of those vectors £ for which (f, €) is an 
integer for all rin L. 
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Theorem 3. (i) The dual of an n-dimensional lattice in an n-dimensional linear 
space is an n-dimensional lattice. 
(i) L =L. 


EXERCISE 3. Prove Theorem 3. 


Exercise 4. Show that £ is discrete if and only if there is a positive number d 
such that the ball of radius d centered at the origin contains no other point of L. 


APPENDIX 6 


Fast Matrix Multiplication 


How many scalar multiplications are needed to form the product C of two n x n 
matrices A and B? Since each entry of C is the product of a row of A with a column 
of B, and since C has n? entries, we need n° scalar multiplications, as well as n^ — n? 
additions. It was a great discovery of Volker Strassen that there is a way of 
multiplying matrices that uses many fewer scalar multiplications and additions. The 
crux of the idea lies in a clever way of multiplying 2 x 2 matrices: 


jp ay Pii ni 
A - 1 B = 1 
fan ay ) D. bs 


cu = ayy yy d abuse = abis + dbi, and so on, Define 


I — {an  àz:)(bii + B22}, 
Il = (az, + azz) by), 

III = ay) (412 — b), 

IV = ay (bg) — 511), (1) 
V = (ag + alb, 

VI = (an — ayy (bn + bia): 

VII = (an — a23)(boi + bay). 
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A straightforward but tedious calculation shows that the entries of the product matrix 
C can be expressed as follows: 


cy = [+ IV — V + VI, cp — III 4- V, 
co) = U-+ IV, co = I + 1 — II + VI. 


The point is that whereas the standard evaluation of the entries in the product matrix 
uses two multiplications per entry, therefore a total of eight, the seven quantities in 
(1) need only seven multiplications. The total number of additions and subtractions 
needed in (1) and (2) is 18. 

The formulas (1) and (2) in no way use the commutativity of the quantities a and 
b. Therefore, (1) and (2) can be used to multiply 4 x 4 matrices by interpreting the 
entries a;; and by as 2 x 2 block entries. Proceeding recursively in this fashion, we 
can use (1) and (2) to multiply any two matrices A and B of order 2^. 

How many scalar multiplications M(K) have to be carried out in this scheme? In 
multiplying two square matrices of order 2^ we have to perform seven 
multiplications of blocks of size 2* ! x 2^ !. This takes 7M(Kk — 1) scalar 
multiplications. So 


M(k) = 7M(k — 1). 
Since M(0) = 1, we deduce that 
M(k) - 7k - yk log; 7 _ nho? (3) 
where n = 2* is the order of the matrices to be multiplied. 

Denote by A(k) the number of scalar additions—subtractions needed to multiply 
two matrices of order 2^ using Strassen’s algorithm. We have to perform 18 additions 
and 7 multiplications of blocks of size 2*7! x 2*7!; the latter takes 7A(k — 1) 
additions, the former 18(24^! )? = 9. 2%-!, So altogether 

A(k) = 9 277! 4 7A(Kk — 1). 


Introduce B(k) = 7-*A(K); then the above recursion can be rewritten as 


4 


k 
B(k) = 5 (2) + B(k — 1). 


Summing with respect to k we get, since B(0) = 0, 


9c-/4V (9 
#0) =30(7) < (3) G3) = 
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therefore 
A(k) € 6 x 7* = 6 x 241082! = 6n'08! (4) 


Since log, 7 = 2.807 --- is less than 3, the number of scalar multiplications required 
in Strassen's algorithm is for n large, very much less than n° the number of scalar 
multiplications required in the standard way of multiplying matrices. 

Matrices whose order is not a power of 2 can be turned into one by adjoining a 
suitable number of 1 s on the diagonal. 

Refinements of Strassen's idea have led to further reduction of the number of 
scalar multiplications needed to multiply two matrices. It has been conjectured that 
for any positive e there is an algorithm that computes the product of two n x n 
matrices using cost n^^* scalar multiplication, where the contant depends on «€. 


APPENDIX 7 


Gershgorin's Theorem 


This result can be used to give very simple estimates on the location of the 
eigenvalues of a matrix, crude or accurate depending on the circumstances, 


Gershgorin Circle Theorem. Let A be an n x à matrix with complex entries. 
Decompose it as 


A=D+F, (1) 


where D ts the diagonal matrix equal to the diagonal of A; F has zero diagonal 
entries, Denote by d, the ith diagonal entry of D, and by f; the (th row of F. Define the 


circular disc C; to consist of all complex numbers z satisfying 
|z — di| = lfily i—1,...,n (2) 


The |-norm of a vector f is the sum of the absolute values of its components; see 
Chapter 14. Claim: every eigenvalue of A is contained in one of the discs C;. 


Proof. Let u be an eigenvector of A, 

Au = AN, (3) 
normalized as ||. = 1, where the so-norm is the maximum of the absolute value of 
the components u; of w. Clearly, |uj| € 1 for j and i = 1 for some i Writing 
A = D+F in (3), the ith component can be written as d, + fju = A, which can be 


rewritten as 


A — d; = fiu. 
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The absolute value of the product fu is = [f], lu]... so 
|^ — di| S fil lul, = Fh- a 


Exercise. Show that if C; ts disjoint from all the other Gershgorin discs, then C; 
contains exactly one eigenvalue of A. 


In many iterative methods for finding the eigenvalues of a matrix A, A is 
transformed by a sequence of similarity transformations into A, so that A; tends to à 
diagonal matrix, Being similar to A, each A, has the same eigenvalues as A. 
Gershgorin's theorem can be used to estimate how closely the diagonal elements of 
A; approximate the eigenvalues of A. 


APPENDIX 8 


The Multiplicity of Eigenvalues 


The set of n x n real, self-adjoint matrices forms a linear space of dimension 
N = n(n + 1)/2. We have seen at the end of Chapter 9 that the set of degenerate 
matrices, that is, ones with multiple eigenvalues, form a surface of codimension 2, 
that is, of dimension N — 2. This explains the phenomenon of “avoided crossing," 
that is, in general, self-adjoint matrices in a one-parameter family have all distinct 
eigenvalues. By the same token a two-parameter family of self-adjoint matrices 
ought to have a good chance of containing a matrix with a multiple eigenvalue. In 
this appendix we state and prove such a theorem about two parameter families of the 
following form: 


aA=bB+cC, a&+b*+cC=1. (1) 


Here A, B, C are real, self-adjoint n x n matrices, and a, b,c are real numbers, 


Theorem (Lax). If n = 2(mod4), then there exist a,b,c such that (1) is 
degenerate, that is, has a multiple eigenvalue. 


Proof. Denote by A the set of all nondegenerate matrices. For any N in. V denote 
by ky < ko < +--+ < kpa the eigenvalues of N arranged in increasing order and by uj 
the corresponding normalized eigenvectors: 


Nu; = kjuj | a |= 1, f= 1,--- 2. (2) 


Note that each u; is determined only up to a factor +1. 

Let 0 € t € 27, be a closed curve in M. If we fix (0), then the normalized 
eigenvector u;(1) can be determined uniquely as continuous functions of r. Since for 
a closed curve N(27) = N(0), 


(27) = 7;u(0), 7; = tl. (3) 


Linear Algebra and its Applications, Second Edition, by Peter D. Lax 
Copyright © 2007 John Wiley & Sons, Inc. 


325 


326 APPENDIX 8: THE MULTIPLICITY OF EIGENVALUES 


The quantities 7, j= l,.... m are functionals of the curve N(1). Clearly: 


(i) Each 7; is invariant under homotopy, that is, continuous deformation in N. 
(ii) For a constant curve, that is, N(t) independent of 1, each 7; = 1. 


N(t) = costA + siniB, O<t<2n (4) 
is a closed curve in \’. Note that N is periodic, and 
N(t + r) = —N(t). 
It follows that 
Aj(t + T) = —A jt) 
and that 
uj(t + 7) = pjün-j+1 (t), (5) 


where p; = +1. Since uw; is a continuous function of f, so is p;; but since p; can only 
take on discrete values, it is independent of f. 

For each value of t, the eigenvectors uj (1)... . , Ua (1) form an ordered basis. Since 
they change continuously they retain their orientation. Thus the two ordered bases 


u1(0),....u,(0) and u,(7),...,un(%) (6) 
have the same orientation. By (5), 

HIIT), -Unal T) = pius (0)... pin (0). (6) 
Reversing the order of a basis for n even is the same as 7/2 transpositions. Since 


each transposition reverses orientation, for n = 2(mod 4) we have an odd number of 
transpositions. So in order for (6) and (6Y to have the same orientation, 


n" 
Ilos -1. 
l 


Writing this product as 


nia 


I] PjPn—j-1 = — l, 
| 


we conclude there 1s an index & for which 


PRPn—k+| = — L. (7) 


1. Generating Functions and Simple Rational Cones J27 


PROBLEMS. 

1°. Check that a simple rational cone in R7 is a closed convex cone without 
straight lines. 

2°. Let AK = {(6).f2): 0 < f < & V2) c R? Prove that K is not a simple 
rational cone. 

P. Let A = [0.-oc) € R'. Check that 


w 


: " 1 
Ji Az) = FM ale 
for all x € C such that |r| « 1. 
4^, Let 
K =R! = [(.... EJER: £& 20 forall i=l. D 
Prove that 4 
l 
: P Hi. aP 
fi A.r} = » Ti Ta eI 
Im alent r=] 
for all (ri... ra) € C such that [z;| « 1 for ? 2 1.... .d. 


5". Let u € Z* be an integer vector and let U = {x € C* ; [x"|.« i}. Prove 


that 
y. 
]-x" 


for every x € U and that the convergence is absolute and uniform on compact 
subsets of U. 


Here is our first result. 


(1.2) Lemma. Let 


A= colti.. omg), 
where uj... uy € Z^ are linearly independent vectors. Let 
ma. 
Il = {Save Us deu i} 
i=l 
he the “semi-open” parallelepiped spanned by uy... . ur. Let 


U-ixcC: MPS fer ih) 
Then for all x € U the series 
x x" 


me ATE! 


converges absolutely and uniformly on compact subsets of U to the motional functum 


nk» = ( y «)II; a 


ne lin 
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Proof. The proof resembles that of Lemma VIL2.1. For a real number £, let [£] 
be the integer part of £ (the largest integer not exceeding €) and let {£} = £ — [£] 
be the fractional part of £. 


Let us choose a point m € A n^, so 


k 
m= thy ty where tk, 2 () for i= 1.;.. , d 
i=l 
Let 
k k 
my = » (addu; and m3 = $ leijui 
i=l i=1 


Thus m = my + mg, m, is an integer point in Il and ma is a non-negative integer 
combination of tjs... ug. Hence every point m € AO Z^ can be represented as 
the sum of an integer point from H and a non-negative integer combination of the 
vectors uj. As in Lemma VIL2.1, it follows that the representation is unique. It 
is also clear that the sum of an integer point from IIl and a non-negative integer 
combination of uj,... uy, is an integer point from A. 


Therefore, we have 


y waf & «)( y xe dose) 


mt Kn ne Tous (ani t EES 


(as formal power series). The second factor is a multiple geometric series which 
sums up to 


il 
azp 1 = yh 
and the result follows: cf. Problem 5 of Section 1.1. C 


PROBLEMS. 


1^. Let K c R* be a simple rational cone as in Lemma 1.2 and let 
k 
fl = {So ain: ew = 1 for iL... kh. 
i=] 


Let int A denote the interior of A considered as a convex set in its affine hull. Prove 


that 
k 
Flint K.x) — x. x" = ( >, x") lI | +. 


meéint KOZ relingi r=] 


provided |x'*| « 1 fori=1.... ,K. 
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and from the last equation that | = A". This shows that the eigenvalues A are the n 
roots of unity A; = exp(22:j), and the corresponding eigenvectors 


2 (3. 32 33 "n 
ej = (Aj As Ajse eo Àj). (1) 
Each eigenvector e, has norm ||e;|| = y/n. 
Every vector u = (uj,...,u,) can be expressed as a linear combination of 


eigenvectors: 


i= pJ ajej- (2) 
l 


Using the explicit expression (1) of the eigenvector, we can rewrite (2) as 


i" an? 
dy = >. dj exp( ik). (2) 


| 


Using the orthogonality of the eigenvectors, along with their norm |{e;|| = s/n, we 
can express the coefficients a; as 
| x“ 27i | 
a; = (u, ej)/n = — : i, exp| — ——jk |. (3) 
i n 


It is instructive to compare formulas (2) and (3) to the expansion of periodic 
functions in terms of their Fourier series. But first we have to rewrite equation (2) as 
follows. Suppose n is odd; we rewrite the last (n — 1)/2 terms in the sum (2)' by 
introducing a new index of summation / related to j by j = n — l. Then 


27i Ami 2zi 
ew( I) -ex( = u- )k = exp( - uk. 
n n n 


Setting this into (2)', we can rewrite it as 


(n—1)/2 5; 
27i, 
Hy = ; a; exp (=x) (4) 


—(n—1)/2 
where we have reverted to denote the index of summation by j and where a; is 


defined by (3). A similar formula holds when 7 is even. 
Let u(x) be a periodic function of period 1. Its Fourier series representation is 


u(x) — P» b; exp(27ijx), (5) 
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where the Fourier coefficients are given by the formula 


| 


b; = [ue exp( —27rijx)dx. (6) 
0 


Setting x = k/n into (5) gives 


k 27i i 
-| = b;exp| — jk |. 5 
(t) = X eo (ATi) (5) 


The integral (6) can be approximated by a finite sum at the equidistant points 
Xk = kn: 


PS k 27i 
b; =- -- — ik |. 6 
j ow) ew( n? (6) 


Suppose u is a smooth periodic function, say d times differentiable. Then the 
(n — 1)/2 section of the Fourier series (5) is an excellent approximation to u(x}: 


(n—1)/2 


u(x) = 5 b; exp(27ijx) + O(n) (7) 


—(n-1)/2 


Similarly, the approximating sum on the right-hand side of (6)' differs by O(n ^) 
from b„. It follows that if in (3) we take up = u(4), a; differs from b; by O(n^^). 

When 4 is a smooth periodic function, its derivatives may be calculated by 
differentiating its Fourier series term by term: 


od u(x) = `> b;(2rij)" exp(2zijx). 


The truncated series is an excellent approximation to 97 u. It follows therefore that 


(n-1)/2 oni 
S aj(2zij)" exp (=i) 
n 


(n—1)/2 


is an excellent approximation to o" u(5), provided that u; in (3) is taken as u(*). 
Therein lies the utility of the finite Fourier transform: It can be used to obtain highly 
accurate approximations to derivatives of smooth periodic functions, which can then 
be used to construct very accurate approximate solutions of differential equations. 

On the other hand, operations such as multiplication of u by a given smooth 
function can be carried out fast and accurately when u is represented by its values at 
the points k/n. Since in the course of a calculation the operation of differentiation 


2, Generating Functions and Rational Cones A3 


PROBLEMS. 

1°. Let P c R" be a rational polyhedron and let v be a vertex of P. Prove 
that v has rational coordinates, 

Hint: Cf. Theorem 11.4.2. 

2* Let P c R" be a bounded rational polyhedron, Prove that P is à rational 
polvtope. 

Hint: Use Problem 1 above and Corollary 11.4.3. 

3° Let P c R^ be a rational polytope. Prove that the polar P^ c R" isa 
rational polyhedron. 

Hint: Cf. Problem 7 of Section IV.1.1 

4^. Let P c R" be a rational polytope, Prove that P is à rational polyhedron, 

Hint: Use Problem 3 above and Corollary IV.1.3. 


5^. Let P c B" be rational polytope, Prove that there exists a positive integer 
ó such that SP is an integer polytope. 


Next, we prove that a rational cone without straight lines has an integer poly- 
tope as a base; see Definition 1.8.3. 


(2.2) Lemma. Let K c R^, K ¢ {0}. be a rational cone without straight lines. 
Then there exists an integer polytope Q C. R which is a base of K. In other words, 
there erst minis Vy -cs v, € Z^ such thet every point xc € Ky (O) has a unique 
representation = Ay for y € Q = conv {tis ns) and A> 0. 


Proof. Suppose that 
K= te (ea) 20 for $m1,— mih 


where c, € Z^. Let c — cq t. + em. 80 ¢ is an integer vector. Let us prove that 
(c.r) < 0 for all r & K * {0} 

Clearly, (c, r) € Ü for every z € A. On the other hand, if (c.n) = 0 for some 
r€ A, then we must have (cj.a) = 0 for i= L.... om (if the sum of non-positive 
numbers is 0, each mimber should be equal to 0). Since we assumed that A dows 
not contain straight lines, we must have r = 0. 


In particular, since K x (0). we have c x 0. 


Let us define an affine hyperplane 
H={reR": (2) =-1} 


and let P = KOH. Hence for every rz € KU) there is à A > O such that Ar € P. 
Thus P is a base of A. 


Clearly, P is à rational polyhedron, We claim that P is à polytope. To demon- 
strate this, we prove that P does not contain rays (see Section T.16). Indeed, 
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suppose that P contains a ray a+ rb for r > 0. Then b x 0 and we must have 
(c, b) € 0 for ¢#=1,... . and hence b € K. On the other hand, we must have 
(c, b) = 0, which is a contradiction. By Lemma IL 16.3, P must be a convex hull of 
the set of its extreme points and hence, by Theorem 11.4.2, P must be a polytope. 
Finally, by Problem 2 of Section 2.1, P is a rational polytope. 


Choosing Q = 4P for some appropriate positive integer à (cf. Problem 5 of 
Section 2.1), we obtain an integer polvtope Q which is a base of K. L1 


PROBLEM. 


1. Prove that A c E" is a rational cone if and only if A can be written as 
A= co(u;. Raa in} for SOME t),...,U, € Z^. 


To reduce the case of a rational cone to the case of a simple rational cone, we , 
need an intuitively obvious, although not-so-easy-to-prove, fact that every polytope 
adopts a triangulation, that is. it can be represented as a union of simplices such 
that every two simplices can intersect only at a common face. 


Figure 85. A triangulation of a polygon 
Since a rigorous proof may require considerable effort, we sketch only a possible 
approach below; see Chapter 9 of [Z95]. 
(2.3) Lemma. Let P C R" be a polytope with the vertices tj... . vu. There exists 
a partition Jj U... U £u, = (1,.. n] such that for the polytopes 
Aj -cov(u:iel) Jalen, 


we have 
l. the points [v, : i € GY are affinely independent for all j = 1... om and 
dim A; = dim P for j = 1..... m; 
2. 


P= U Aj; 
3-1 


3. the intersection A, Ap, if non-empty, is a proper common face of A, and 


Aj. 
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There are some additional computational expenses; additions and rearrangements 
of vectors. The total amount of work is 5r log; n flops (floating point operations). 

The inverse operation, expressing u, in terms of the a; [see (2)']. is the same, 
except that w is replaced by c, and there is no division by n. 

There is an interesting discussion of the history of the Fast Fourier Transform in 
the 1968 Arden House Workshop. 

When Cooley and Tukey's paper on the Fast Fourier Transform appeared, 
Mathematical Reviews reviewed it by title only; the editors did not grasp its 
importance. 
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APPENDIX 10 


The Spectral Radius 


Let X be a finite-dimensional Euclidean space, and let A be a linear mapping of X 
into X. Denote by A) the spectral radius of A: 


r(A) = max la;]. (1) 
i 
where a; rangers over all eigenvalues of A. We claim that 


lim |[A/||'7 = r(A), (2) 
Joo 


where ||A/|| denotes the norm of the jth power of A. 
Proof. A straightforward estimate [see inequality (48), of Chapter 7] shows that 
IA] > r(A). (3) 
We shall show that 
lim sup|[A/]|'7 < r(A). (4) 
Combining (3) and (4) gives (2). 
We can introduce an orthonormal basis in X, thereby turning X into C", with the 


2 241/2 we 
standard Euclidean norm (|x| +--+ + |x|") 7, and A into an n x n matrix with 
complex entries. We start with the Schur factorization of A: L 


Theorem 1. Every square matrix A with complex entries can be factored as 


A = QTQ*, (5) 
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where Q is unitary and T is upper triangular: 

f; = 0 for i >j. 
(5) is called a Schur factorization of A. 


Proof. If A is a normal matrix, then according to Theorem 8 of Chapter 8 it has a 
complete set of orthogonal eigenvectors q)...., Gn, With eigenvalues a,,..., ani 


Aq, = aq. (6) 
Choose the q; to have norm |, and define the matrix Q as 


Q = (qi... d) 


Since the columns of Q are pairwise orthogonal unit vectors, Q is unitary. Equations 
(6) can be expressed in terms of Q as 


AQ — QD. (6) 


where D is the diagonal matrix with Dj; = ag. Multiplying (6) by Q* on the right 
gives a Schur factorization of A, with T = D. 

For arbitrary A we argue inductively on the order n of A. 

We have shown at the beginning of Chapter 6 that every n x n matrix A has at 
least one eigenvalue a, possibly complex: 


Ag = ao. (7) 
Choose g to have norm 1, and complete it to an orthonormal basis q|,..., 4.91 =q, 
and define the matrix U to have columns g)....,qp: 
U= (qi,.... d»). 


Clearly, U is a unitary matrix, and the first column of AU is ag: 


AU = (aq,cs,..., Cy). (7) 


The adjoint U* of U has the form 
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Figure 97. The reader who imagines the positive direction as upward 
may need to view this picture upside down. 


We obtain f(P,x) by differentiating J (K, (x,z441)) with respect to tapı and 
substituting z44; = 0 into the derivative. 

Indeed, we observe that for every lattice point (mi, u) € A the last coordinate 
u is non-negative. By a standard result in complex analysis, we can differentiate 
the series and conclude that the series 


(3.1.1) LO í"zu- b, i+ YO ea 
(muje ENETH! mi € Png’ (ma, )e K rz **! 
meZ ul m; EZ" p>? 


converges absolutely and uniformly on compact sets in U, to a rational function 
a 


Rare ta+1)): 


Let  c C be the projection of Ui: (x, rz441) —* x. Substituting z44,; = 0 in 
(3.1.1), we conclude that for every x € U the series 


E ss 


-mjcPnzt 


converges absolutely and uniformly on compact subsets of Uy to the rational fune- 
tion 


f(P.x) = S IK, o E) o: 


dl 
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PROBLEMS. 
1°, In the situation of Lemma 3.1, let m € Z^ be a lattice vector and let P 4- m 
be the translation of P. Prove that 


fL P + m, x) 2 x" f( P, x). 


Here is a continuous version of Lemma 3.1. 


2. Let P c Rf be a polyhedron without straight lines. Prove that there exists 
a non-empty open set U C C^ and a rational function è : C^ — C such that for 
all c € U we have 


f etie 2) ae = ote) 


and the integral converges absolutely. Again, we let (c.r) = (a, r) + iib, r) for a 
complex vector c = a + ib, 


Hint: Use Problem 2 of Section 2.4 and the trick of Lemma 3.1. Instead of 
differentiating, use the Laplace transform. 


3*. Let uj... Un € Z be vectors such that the cone A = co(uj,... ,u4) does 
not contain straight lines. Let 


n ; 
Sum ^» Giu, Where 2,...,aG, Are non-negative integers) 


imi 


be the semigroup generated by uj,... tna; cf. Problem 6 of Section 1.2. Prove 
that there exists a non-empty open set U C C^ such that for all x € U the series 
$ meg X" converges absolutely and uniformly on compact subsets of U to a rational 
function in x. 


Hint: Let RY be the non-negative orthant in R". Construct a linear transfor- 
mation T : R" —+ R? such that T(Z7) = S. Construct a set Q C R7 which isa 
finite union of rational polyhedra and such that the restriction T: QO z^ — S is 
a bijection. Apply Lemma 3.1. 


We are getting ready to prove the central result of this chapter. We state it 
in the form of the existence theorem for some particular valuation; cf. Sections 
L7 and L8. We need the rational analogue of the algebra P(R*) of polyhedra: see 
Definition 1.9.3. 


(3.2) Definitions. The real vector space spanned by the indicator functions |F] 
of rational polyhedra P c R4 is called the algebra of rational polyhedra in R^ and 
denoted P(Q*). Let Ciri... r4) denote the complex vector space of all rational 
functions in d variables. 
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where c is some positive number. Using the triangle inequality, we conclude that 
IIT.]| = [Da + Sell S r(A) + ce. (15) 
It follows from (14) that 
Tem Lb. 
Set this in the Schur factorization (5) of A: 
A = QD,T,D, !Q'. (16) 
Denote QD, by M. Since Q is unitary, Q' = Q^, and (16) can be rewritten as 
A = MT,M '. (16) 
It follows that 
A! = MT/M '. 
Using the multiplicative inequality for the matrix norm, we obtain the inequality 
IAE < (IMI) IM | [Tel 
Taking the jth root, we get 
AM < mT I, 
where m = |\M|{||M~'||. Using the estimate (15) on the right gives 
IA] < m'4(r(A) + ce). 
Now let j tend to oe; we get that 
lim sup || A || < r(A) + ce. 
Since this holds for all positive e < 1, we have 
lim sup || A/]|'7 < r(A), 
as asserted in (4). As noted there, (4) and (3) simply (2). a 


EXERCISE 2. Show that the Euclidean norm of a diagonal matrix is the 
maximum of the absolute value of its eigenvalues. 
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Note I. The crucial step in the argument is to show that every matrix is similar to 
an upper triangular matrix. We could have appealed to the result in Chapter 6 that 
every matrix is similar to a matrix in Jordan form, since matrices in Jordan form are 
upper triangular. Since the proof of the Jordan form is delicate, we preferred to base 
the argument on Schur factorization, whose proof is more robust. 


EXERCISE 3. Prove the analogue of relation (2), 


lim |A/|'7 = r(A), (17) 
j— 0€ 


when A is a linear mapping of any finite-dimensional normed, linear space X (see 
Chapters 14 and 15). 


Note 2. Relation (17) holds for mappings in infinite-dimensional spaces as well. 
The proof given above relies heavily on the spectral theory of linear mappings in 
finite-dimensional spaces, which has no infinite-dimensional analogue. We shall 
therefore sketch another approach to relation (17) that has a straightforward 
extension to infinite-dimensional normed linear spaces. This approach is based on 
the notion of matrix-valued analytic functions. 


Definition 1. Let z — x-+ iy be a complex variable, A(z), an n x n matrix- 
valued function of z. A(z) is an analytic function of z in a domain G of the z plane if 
all entries aj;(z) of A(z) are analytic functions of z in G. 


Definition 2. Xisafinite-dimensional normed linear space, and A(z) is a family 
of linear mappings of X into X, depending on the complex parameter z. A(z) depends 
analytically on z in a domain G if the limit 


. A(z-h)—-AÀ() 
jm EEDA Lg 


exists in the sense of convergence defined in equation (16) of Chapter 15. 
EXERCISE 4. Show that the two definitions are equivalent. 


EXERCISE 5. Let A(z) be an analytic matrix function in a domain G, invertible at 


every point of G. Show that then A~! (z), too, is an analytic matrix function in G. 


EXERCISE 6, Show that the Cauchy integral theorem holds for matrix-valued 
functions. 


The analytic functions we shall be dealing with are resolvents. The resolvent of A 
is defined as 


(18) 
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for all z not an eigenvalue of A. It follows from Exercise 5 that R(z) is an analytic 
function. 


Theorem 2. For |z| > |A| R(z) has the expansion 


R(z) — mE (19) 


" it follows that the series on the 


Proof. By the multiplicative estimate |A"| < |A 
right-hand of (19) converges for |z| > |A]. 

Multiply (19) by (zI — A); term-by-term multiplication gives I on the right-hand 
side. This proves that (19) is the inverse of (zI — A). [ ] 


Multiply (19) by z^ and integrate it over any circle |z| = s > JA]. On the right- 
hand side we integrate term by term; only the jth integral is #0, so we get 


/ R(z)z dz = 2xiA. (20) 


z|zs 


Since R(z) is an analytic function outside the spectrum of A, we can, according to 
Exercise 6, deform the circle of integration to any circle of radius 
s=r(A)+ 6, €> 0: 


J R(z)z dz = 2xiA.. (20)' 


z|2-r( A)-r-« 


To estimate the norm of A/ from its integral representation (20)', we rewrite the dz 
integration in terms of d@ integration, where 0 is the polar angle, z= se’ and 
dz = sie" dé: 


2m- 


| TEN um 
A = = R(se!?) sel a0. (21) 
Ü 


The norm of an integral of linear maps is bounded by the maximum of the integrand 
times the length of the interval of integration. Since R(z) is an analytic function, it is 
continuous on the circle |z| = r(A) + e, € > 0; denote the maximum of |R(z)| on 
this circle by c(e). We can then estimate the norm of A/ from its integral 
representation (21), with s = r(A) + e, as follows: 


IA] € (r(A) + e> ele). 
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Take the jth root: 

|a? | < m(e) (r(A) + €), (22) 
where m(e€) = (r(A) + €) c(e). Let j tend to oo in (22); we get 


lim sup rum < r(A) 4 c. 
jx 


Since this holds for all positive e, no matter how small, 
lim sup |A/|'? < r(A). 
On the other hand, analogously to (3), 
|| > r(A) 
for any norm. Taking the jth gives 
JA’? > r(A). 
Combining this with (21), we deduce (17). LI 


This proof nowhere uses the finite dimensionality of the normed linear space on 
which A acts. 


APPENDIX 11 


The Lorentz Group 


l. In classical mechanics, particles and bodies are located in absolute, motionless 
space equipped with Euclidean structure. Motion of particles is described by giving 
their position in absolute space as a function of an absolute time. 

[n the relativistic description, there is no absolute space and time, because space 
and time are inseparable. The speed of light is the same in two coordinate systems 
moving with constant velocity with respect to each other. This can be expressed by 
saying that the Minkowski metric t? — x? — y? — z? is the same in both coordinate 
systems—here we have taken the speed of light to have the numerical value 1. 

A linear transformation of four-dimensional space-time that preserves the 
quadratic form f° — x^ — y^ — z* is called a Lorentz transformation. In this chapter 
we shall investigate their properties. 

We start with the slightly simpler (2 + 1)-dimensional space-time. Denote by u 
the space-time vector (t, x, v)', and denote by M the matrix 


| 0 Ü 
M={0 -1 O0 |. (1) 
0 0 -l 
Clearly, 
i? — x! — y! = (u, Mu), (2) 


where (.) denotes the standard scalar product in R`. 
The condition that the Lorentz transformation L preserve the quadratic form (2) is 
that for all u, 


(Lu, MLu) = (u, Mu) (3) 
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(4.3) Lemma. Let A C —R/*! be the standard d-dimensional simpler, A = 
conv(e,;i=1,....d+ 1). where ep,- caa] are the standard basis vectors. Then 
one can find rational polyhedra Py C RÀ, k= 1,..., N, such that 
1. each polyhedron Py contains a straight lime parallel to e; — e; for some pair 
lsi<jed+l; 
2. we have 


di N 
[A] = T [ cone(A. e, )] = MEC for some ay, € (-1,1). 
imi kel 


In particular, modula Py(Q"), the indicator function of the standard simpler is the 
sum of the indicator functions of the support cones al its vertices. 


Proof. Let us identify R" with the affine hull of e4,... , 64,4. Let Ht be the elosed 
halfspace £, > 0 in R^. Then 
del PON 
A= (1 Hf and cone(A,e;) = (1 H}. 
i=l ix 


By the Inclusion-Exclusion Formula (see Lemma 1.7.2), we have 


del = 
m-[Um]- Y. (-n'' yay. 
iml Ic (1... +1) ir 
TEL 
Let a 
P,-()u. 
iE 
If | = (L....d - 1). we have P, = A. IPI = (L...,.d-- 1}\ {i}, we have 
P = cone(A. e). All other polyhedra P; contain straight lines, In particular, if 
1,3 € T. then Py contains a straight line in the direction of e, — ej. o 
PROBLEMS. 


1^. Let cone( P. F) be the support cone of P at a face F C P: see Problem 3 of 
Section 4.1. Let us fix a 0g k « d. Prove that for the standard simplex A 


[A] - D (-1 jair F [ cone( A, F) 


Fis a face of à 
dim Fok 


is a linear combination of the indicator functions of polyhedra P; each of which 
contains a (k + 1)-dimensional affine subspace. 
?*. Let P c R" be a rational polyhedron and let T ; R" — R^ be a linear 


transformation with a rational matrix. Prove that T( P) is a rational polyhedron 
in Rt. 
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If L maps (1, 0, O)' into the blc, we argue analogously; this completes the proof of 
Theorem 1. a 


Definition. It follows from (4) that detL = +1. The Lorentz transformations 
that map the flc onto itself, and for which det L = 1, form the proper Lorentz group. 


Theorem 2. Suppose L belongs to the proper Lorentz group, and maps the point 
e = (1.0,0) onto itself. Then L is rotation around the f axis. 


Proof. Le = e implies that the first column of L is (1, 0, 0)'. According to (4), 
L'ML = M; since Me = e, L'e = e; therefore the first column of L” is (1, 0, 0Y. So 
the first row of L is (1, 0, 0). Thus L has the form 


1 0 0 
L2|O0 
0 R 


Since L preserves the Minkowski metric, M is an isometry. Since detL = 1, 
det M = 1; so R is a rotation. E 


EXERCISE 2. Show that Lorentz transformations preserve solutions of the wave 
equation. That is, if f(t, x, v) satisfies 


fu — fu — fw = 0, 
then f(L(t, x, v)) satisfies the same equation. 


Next we shall present an explicit description of proper Lorentz transformations. 
Given any point u = (t, x, y)', we represent it by the 2 x 2 symmetric matrix U: 


|ft-x y 
ü= y th (5) 


Clearly, U is real and symmetric and 
detU =r —x4 — y^, trU = 2t. (6) 


Let W be any 2 x 2 real matrix whose determinant equals 1. Define the 2 x 2 matrix 
V by 


WUW’ = V. (7) 
Clearly, V is real and symmetric and 


det V = (det W)(det U)(det W’) = det U, (8) 
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since we have assumed that det W = 1. Denote the entries of V as 


f-y y | 
dn ( y "nal (9) 


Given W, (5), (7), and (9) defines a linear transformation (t, x.y) — (t. x^, y). It 
follows from (6) and (8) that 7? — x^ — y? = (^ — x^ — y?, That is, each W 
generates a Lorentz transformation. We denote it by Ly. Clearly, W and —W 
generate the same Lorentz transformation. Conversely, 


EXERCISE 3. Show that if W and Z generate the same Lorentz transformation, 
then Z = W or Z = —W. 


The 2 x 2 matrices W with real entries and determinant ] form a group under 
matrix multiplication, called the special linear group of order 2 over the reals. This 
group is denoted as SL(2, R). 


EXERCISE 4. Show that SL(2, R) is connected—that is, that every W in SL(2, R) 
can be deformed continuously within SL(2, R) into I. 


Formulas (5), (7), and (9) define a two-to-one mapping of SL(2, R) into the 
(2 + 1)-dimensional Lorentz group. This mapping is a homomorphism, that is, 


Eo ESL (10) 


EXERCISE 5. Verify (10). 


Theorem 3. (a) For W in SL(2, R), Lw belongs to the proper Lorentz group. 

(b) Given any two points u and v in the flc, satisfying (u, Mu) = (v. Mv), there is 
a Y in SL(2, R) such that Lyu = v. 

(c) IF Z is a rotation, Lz is a rotation around the f axis. 


Proof. (a) A symmetric matrix U representing a point u = (t, x, y) in the flc is 
positive, and the converse is also true. For according to (6), detU = 7 — x? — y?, 
tr U = 21, and the positivity of both is equivalent to the positivity of the symmetric 
matrix U. 

By definitions (7) and (9), the matrix V representing v = Lwu is V = WUW’; 
clearly, if U is a positive symmetric matrix, so is V. This shows that Ly maps the flc 
into the flc. 

According to (4), the determinant of the Lorentz transformation Ly is 1 or — 1. 
When W is the identity I, Ly is the identity, and so det Li = 1. During a continuous 
deformation of W, detLy changes continuously, so it doesn't change at all. 
Therefore, det Ly = | for all W that can be deformed continuously into I. According 
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to Exercise 4, all W can be deformed into I: this shows that for all W, Lw is a proper 
Lorentz transformation. 

(b) We shall show that given any v in the flc, there is à W in SL(2, R) such that 
Lw maps te = t(1,0,0)' into v, where t is the positive square root of (v, Mv). The 
matrix representing e is l; the matrix representing Lyte is WHW’ = /WW', So we 
have to choose W so that ?WW' = V, where V represents v, Since £^ = (v, Mv) = 
det V and since by (a) V is positive, we can satisfy the equation for W by setting W as 
the positive square root of 1! V. 

Similarly, for any other point u in the fle for which (u, Mu) = (v, Mv), there isa Z 
in SL(2, R), for which Lzte = u. Then LwL;! maps zu into v, Since W — Ly is a 
homomorphism, LwLz! = Ly. 

(c) Suppose that Z is a rotation in R^; then Z/Z = L Using the commutativity of 
trace, we get from V — ZUZ' that 


tir V = trZUZ! = trUZ'Z = trU 


for all U. For U of form (5) and V of form (9), trU = 2t, tr V = 2r, sor = r forall U. 
Since ? — x? — y? = 1^ — x? — y^, it follows that Lz maps (r, 0, 0) into itself. We 
appeal to Theorem 2 to conclude that Lz is rotation around the f axis. Q 


EXERCISE 6, Show that if Z is rotation by angle 8, Ly is rotation by angle 26. 


Theorem 4. Every proper Lorentz transformation L is of the form Ly, Y in 
SL(2, R). 


Proof. Denote by u the image of e = (1,0,0) under L: 
Le — u. 


Since e lies in the flc, so does u. According to part (b) of Theorem 3, Lwe = u for 
some W in SL(2, K). Therefore Lae Le = e; according to Theorem 2, LL is 
rotation around the ¢ axis. By part (c) of Theorem 3, along with Exercise 6, there 
is a rotation Z in SL(2, R) such that Lẹ L = Lz; it follows that 
L = LyLz = Lwyz. E 


EXERCISE 7. Show that a symmetric 2 x 2 matrix is positive iff its trace and 
determinant are both positive. 


EXERCISE 8. (a) Let Lís) be a one-parameter family of Lorentz transformations 


that depends differentiably on s. Show that L(s) satisfies a differential equation of the 
form 


à,L = AML, (11) 


where A(s) is anti-self-adjoint. 
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Proof. Follows by Theorem 3.3 and Theorem 4.5. LI 


Ja=(0,0)  beü.0) 


cone (P, à) 


Figure 99 


(4.7) Example. For the triangle P in Figure 99, there are three vertices a, b and 
c and three support cones. For cone( P, a), we have 


mo qe t 4 _ l 
x"- S, i = (1-2) -ra 


rri &canel Papx (ey ea ERS 


provided Lri|,|ra2] < 1. Therefore, by Part 2 of Theorem 3.3, we have 
- 1 
(1—zj1—2z3) 


The support cone at b, translated to the origin, is spanned by the vectors a — b = 
(-1,0) and c — b = (—1, 1). Since we have 


=f SING. 
Jace (5 i [=x 
by Corollary VIL2.6 the fundamental parallelepiped of a — & and c — b does not 
contain any lattice point other than the origin. Therefore, by Lemma 1.2 we have 


m. b QAda-b)eua(e-b) n n O T1 
24 AE 2, T ~ (=a Hi- a'a) 
mécome Fb) (wy ga Jen? : nes 


F| conet P. a)| 
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Since (u, p) = 0, 
st = —(ax + by) 


Applying the Schwarz inequality on the right gives the opposite of the previous 
inequality, a contradictions. 

Given two distinct points u and v in H, there is a unique p, except for a constant 
multiple, that satisfies (w,p) = 0, (v, p) = 0. According to what we have shown 
above, p satisfies (13) when u or v belongs to H. 

(c) Take the line consisting of all points u in H that satisfy (u, p) = 0, where p is 
a given vector, (p. Mp) « 0. Let L be a proper Lorentz transformation; the inverse 
image of the line under L consists of all points v such that u = Lv. These points v 
satisfy (Lv. p) = (v, Lp) = 0. We claim that the points v lie on H, and that q = Lp 
satisfies (qd. Mg) < 0. Both of these assertions follow from the properties of proper 
Lorentz transformations. LJ 


Next we verify that our geometry in non-Euclidean. Take all lines (u.p) = 0 
through the point u = (1,0,0). Clearly, such a p is of the form p = (0, a. b), and the 
points u = (t,x, y) on the line (u, p) = 0 satisfy 


ax + by = 0. (14) 


Take g = (1, 1,1): points u = (t. x, v) on the line (u, q) = O satisfy t + x + y = 0. 
For such à point u, 


2 


(u,Mu) =P x - y! 2 (x - y) — à — y? 2 2xy. (15) 

The points u on the intersection of the two lines satisfy both (u.p) — 0 
and (u,q) = 0. If a and b are of the same sign, it follows from (14) that x and y 
are of the opposite sign; it follows from (15) that such a u does not lie in the 
flc. 

Thus there are infinitely many lines through the point (1, 0, 0) that do nor intersect 
the line £ 3- x -- y — O0 in H; this violates the parallel postulate of Euclidean 
geometry. 

In our geometry, the proper Lorentz transformation are the analogues of 
Euclidean motion translations combined with rotations. Both objects form a three- 
parameter family. 

We turn now to the definition of distance in our geometry. Take two nearby 
points in H, denoted as (f,x,y) and (t+ dt, x + dx.y + dy). Their image under 
a proper Lorentz transformation L is (f,x,v’) and (r + di, x' + dx’, y' + dy’), 
Since L is linear, (dídx,dy ) is the image of (dt,dx,dy) under L, and 
therefore 
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an invariant quadratic form. Since for points of H, P — x? — y? = I, we have 
dt = * dx -- 2 dy. So we choose as invariant metric 


: 
E i X 2xy 
dX + dy? — dr = 2 dx^ + à dy” — 2 dxdy. (16) 


Once we have a metric, we can define the angle between lines at their point of 
intersection using the metric (16). 

3. In this section we shall briefly outline the theory of Lorentz transformations in 
(3 + 1)-dimensional space-time. The details of the results and their proofs are 
analogous to those described in Section 1. 

The Minkowski metric in 3 + ] dimensions is /^ — x^ — y^ — z^, and a Lorentz 
transformation is a linear map that preserves this metric. The Minkowski metric can 
be expressed, analogously with (2), as 


(u, Mu), (17) 


where M is the 4 x 4 diagonal matrix whose diagonal entries are 1, —1, —1, and —1, 
and u denotes a point (f,x,y,z) in (3 + 1)-dimensional space-time. The forward 
light cone is defined, as before, as the set of points u for which (u, Mu) > 0 and 
t> 0. 

A Lorentz transformation is represented by a matrix L that satisfies the four- 
dimensional analogue of equation (4): 


L'ML = M. (18) 


A proper Lorentz transformation is. one that maps the flc onto itself and 
whose determinant det L equals 1. The proper Lorentz transformations form a 
group. 

Just as in the (2 + 1)-dimensional case, proper Lorentz transformations in 3 + | 
space can be described explicitly. We start by representing vectors u = (1. x, y, z) in 
3 + 1 space by complex-valued self-adjoint 2 x 2 matrices 


u= (7; ied (19) 


y-iz {+x 
The Minkowski metric of u can be expressed as 


t^ — x* — y* — z? = detU. (20) 


Let W be any complex-valued 2 x 2 matrix of determinant 1. Define the 2 x 2 
matrix V by 


V —WUW', (21) 


350 APPENDIX II: THE LORENTZ GROUP 


where W” is the adjoint of W, and U is defined by (19). Clearly, V is self-adjoint, so it 
can be written às 


-x y+iz 
— ( F + _f F + J ) (22) 
given W, (19), (21), and (22) define a linear map (t,x, y, z) — (t, x', yz). Take the 
determinant of (21): 
det V = (det W)(det U)(det W*). 


Using (20), (20), and det W = 1, it follows that 


This shows that each W generates a Lorentz transformation. We denote it as Ly. 
The complex-valued 2 x 2 matrices of determinant 1 form a group denoted as 


SLQ, C). 

Theorem 6. (a) For every W in SL(2, C), Lw defined above is a proper Lorentz 
transformation. 

(b) The mapping W — Ly is a homomorphic map of SL(2, C) onto the proper 
Lorentz group. This mapping is 2 to I. 

We leave it to the reader to prove this theorem using the techniques developed in 
Section I. 

4. In this section we shall establish a relation between the group SU(2, C) of 2 x 2 


unitary matrices and SO(3, R), the group of rotations in &*. 
We represent a point (x,y,z) of R? by the 2 x 2 matrix 


( X y 1 9 - U. (23) 
y-iz -x | 


Clearly. 
— det U = x^ + a Tz. (24) 
The matrices U are 2 x 2 self-adjoint matrices, trace of U = 0. 


Theorem 7. Zisa2 x2 unitary matrix of determinant 1, and U is as above. 
Then 


V = ZUZ’ (25) 
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contains some negative terms and does depend on k. We are interested in the 
constant term of the expansion of (5.1.2). Collecting the terms, we conclude that 
the constant term of (5.1.2) is 


d 


ey)! 
y, x Or; a — f. 


E 


which is a polynomial in k. Equating the constant terms in (5.1.1). we get 


m d git 
[AP OZ*| = Y y es 054-4 = poly(k), 


i=] f=0 l 


which completes the proof. T 


Figure 100. Example: i triangle P and its dilations 2P and 3P. One 
can observe that |kP OZ?) = &*/2 + 3k/2 + 1. 


PROBLEMS. 


l. Let P C R" be an integer polytope and let p(k) = |kPO Z4| be its Ehrhart 
polynomial, Prove that deg p = dim P. 


2. Deduce the existence of the Ehrhart polynomial for integer polygons from 
Pick's Formula; see Problem 6 of Section VIL2.6. Prove that in the case of integer 
polvgons the coefficients of the Ehrhart polynomial are non-negative. 


3. Let P C R be an integer polytope and let p be its Ehrhart polynomial. 
Prove that for any positive integer k we have 


p(—k) = (71)! Pl int(k P) n Z^]. 


where the interior of a polvtope is considered with respect to its affine hull (the 
reciprocity relation). 


Hint: Use Problem 1 of Section 4.7 and Problem 1 of Lemma 2.4. 


APPENDIX 12 


Compactness of the Unit Ball 


In this Appendix we shall present examples of Euclidean spaces X whose unit ball is 
compact—that is, where every sequence {x,} of vectors in X, || x, || € 1, has a 
convergent subsequence, According to Theorem 17 of Chapter 7, such spaces are 
finite dimensional. Thus compactness of the unit ball is an important criterion for 
finite dimensionality. 

Let G be a bounded domain in the x, y plane whose boundary is smooth. Let 
u(x, y) be a twice differentiable function that satisfies in G the partial differential 
equation 


au + Au = 0, (1) 
where a is a positive constant, and A is the Laplace operator: 
Aut = ty + lyy; (2) 
here subscripts x, y denote partial derivatives with respect to these variables. 
Denote by S the set of solutions of equation (1) which in addition are zero on the 
boundary of G: 


u(x, y) =0 for (x, v) in dG. (3) 


Clearly, the set S of such solutions form a linear space. 
We define for u in S the norm 


lel? = | (x, ona. (4) 
G 
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6. Example: Totally Unimodular Polytopes 


In one special case, Brion's Theorem (Corollary 4.6) gives a particularly succinct 
representation of the generating function. 


(6.1) Definition. Let u;,....wu& € Z^ be linearly independent vectors and let 
K = co(uj.... ug). Let Ly = span(t,..., uy) and let Ay = Z° Ly. We say 
that A is à unimodular cone provided u,,... ,uy is a basis of Ay, considered as a 
lattice in Ly. We call ti, .. uy generators of K. 

Let P c R^ be an integer polytope. We say that P is totally unimodular 
provided the support cone at every vertex of P is a translation of a unimodular 
cone. 


Some important polytopes are totally unimodular. 


PROBLEMS. 
l. Let A be the standard (d — 1)-dimensional simplex in R?: 
d 
A= fingu) Y&-1 and G20 for i-1....4). 
i=l 


Prove that A is a totally unimodular polytope. 

2. Let us fix positive integers m and n and let us identify R¢, d = mn, with 
the space of m x n real matrices (£,). Thus Z^ is identified with the space of all 
m x mn integer matrices. Let us fix positive integers &1,... , c and 3,,....8, and 
let P c R^ be the polyhedron of all non-negative m x n matrices with row sums 
Gps ,GQ and column sums 9,,... , 4. Suppose that dim P = (m — 1)(n— 1) and 
that P is a simple polytope; see Definition VL5.1 (which means that &1,... , Oma 
and 3... Än are chosen in a sufficiently generic way). Prove that P is a totally 
unimodular polytope. 

Remark: More generally, a sufficiently generic transportation polytope (see 
Section IL7) is totally unimodular. Non-negative integer matrices with prescribed 
row and column sums are called contingency tables. 

3. Let upo.. td € Z^ be linearly independent lattice points and let K = 
co(ui,..., ug). Prove that A can be dissected into the union of unimodular cones, 
that is, there is a decomposition 


K «| ]JK.. 
i=l 
where each cone A, is unimodular and the intersection Ain K y of every two distinct 
cones A, and A; is a proper face of both. 
4. Let K C R* be a unimodular cone such that dim K = d. Prove that the 
polar A° cC R” is a unimodular cone. 


For totally unimodular polytopes, Brion's Theorem (Corollary 4.6) gives a par- 
ticularly nice identity. 
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Theorem 3. Let G be a bounded domain with a smooth boundary, and let D be a 
set of functions in G whose values and the values of their first derivatives are 
uniformly bounded in G by a common bound m. Every sequence of functions in D 
contains a subsequence that converges in the maximum norm. 


EXERCISE I. (i) Show that a set of functions whose first derivatives are 
uniformly bounded in G are equicontinuous in G. 
(ii) Use (i) and the Arzela—Ascoli theorem to prove Theorem 3. 


APPENDIX 13 


A Characterization of Commutators 


We shall prove the following result. 


Theorem 1. An n x n matrix X is the commutator of two n x n matrices A 
and B, 
X — AB — BA, (1) 
iff the trace of X is zero. 
Proof. We have shown in Theorem 7 of Chapter 5 that trace is commutative— 
that is, that 
tr AB = tr BA. 


It follows that for X of form (1), tr X — 0. We show now the converse. LI 


Lemma 2. Every matrix X all of whose diagonal entries are zero can be 
represented as a commutator, 


Proof. We shall construct explicitly a pair of matrices A and B so that (1) holds. 
We choose arbitrarily n distinct numbers a),....a, and define A to be the diagonal 
matrix with diagonal entries a;: 


0 for i Æj. 
- | iid 
dj for i — j. 
We define B as 
X; 
y for i Æj, 
Bj = ü; — a; 


anything for i = j. 


Linear Algebra and its Applications, Second Edition, by Peter D. Lax 
Copyright © 2007 John Wiley & Sons, Inc. 


355 


356 APPENDIX I3: A CHARACTERIZATION OF COMMUTATORS 
Then for i #/ 


(AB = BA); = ajBj = B ja; 
= (a; — aj)By = Xj, 


while 
(AB = BA). — a; Bj; = Bj;a; =Ü. 
This verifies (1). LI 


To complete the proof of Theorem 9 we make use of the observation that if X can 
be represented as a commutator, so can any matrix similar to X. This can be seen 
formally by multiplying equation (1) by S on the left and S~! on the right: 


SXS ! = SABS ! - SBAS 
= (SAST! (SBS^') - (SBS~')(SAS7!). 


Conceptually, we are using the observation that similar matrices represent the same 
mapping but in different coordinate systems. 


Lemma 3. Every matrix X whose trace is zero is similar to a matrix all whose 
diagonal entries are zero. 


Proof. Suppose not all diagonal entries of X are zero, say xj, Æ 0. Then, since 
tr X = 0, there must be another diagonal entry, say x22, that is neither zero nor equal 
to x1;. Therefore the 2 x 2 minor in the upper left corner of X, 


is not a multiple of the identity. Therefore there is a vector A with two components 
such that YA is not a multiple of h. We introduce now A and YA as new basis in R^; 
with respect to this basis Y is represented by a matrix whose first diagonal element 
is Zero. 

Continuing in this fashion we make changes of variables in two-dimensional 
subspaces that introduce a new zero on the diagonal of the matrix representing X, 
without distroying any of the zeros that are already there, until there are n — | 
zeros on the diagonal. But since tr X = 0, the remaining diagonal element is 
zero too. CJ 


Combining Lemma 2 and Lemma 3 gives Theorem 1. 


APPENDIX 14 


Liapunov's Theorem 


In this Appendix we give a far-reaching extension of Theorem 20 in Chapter 10. We 
start by replacing Z in that result by its negative, W = —Z, 


Theorem 1. Let W be a mapping of a finite-dimensional Euclidean space into 
itself whose self-adjoint part is negative: 


W 4- W* « 0. (1) 
Then the eigenvalues of W have negative real part. 


This can be proved the same way as Theorem 20 was in Chapter 10. We state now 
a generalization of this result. 


Theorem 2. Let W be a mapping of a finite-dimensional Euclidean space X into 
itself. Let G be a positive self-adjoint map of X into itself that satisfies the inequality 


GW + W'G <0. (2) 

Then the eigenvalues of W have negative real part. 
Proof. Let h be an eigenvector of W, where w is the corresponding eigenvalue: 
Wh — wh. (3) 


Let the left-hand side of (2) act on A, and take the scalar product with /; according 
to (2). 


((GW + W*G)A, h) < 0. (4) 
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which are of the form e", are less than ] in absolute value. But then the spectral 
radius of e", the maximum of all e", is also less than 1: 


r(e") « 1. (7) 
We conclude from (6) applied to A = e" that 
l| eV! | < (r(e™) + ey (8) 


where « tends to zero as j — oc. It follows from (7) and (8) that ||e"*|| decays 
exponentially as ¢ — oc through integer values. 

For ¢ not an integer, we decompose t as t = j +f, where j is an integer and f is 
between 0 and 1, and we factor e" as 


Wi 


e Wi eS 


= ť 
So 
lew" || < | 


e"! jille“ |]. (9) 


| €! || we replace in (5) W by Wf, 


xX Wit 
wr _ i 
J z^ MC 


and apply the additive and multiplicative estimates for the norm of a matrix: 


To estimate 


AX 
|| eV |] S ^ || w* prit 
| 
WIES iwy 
> Q4 an 


Since f lies between 0 and 1, 
el Wl < gl WII, (10) 


We can use (8) and (10) to estimate the right-hand side of (9); using j = 1 — f. we 
get 


le || € (rte) tel NI, (11) 


where e tends to zero as t — oc. According to (7), r = r(e") < |: thus it follows from 
(11) that || eW' || decays to zero at an exponential rate as ¢ tends to oc. L 
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We show now that G, as defined by (12), has the three properties required in 
Theorem 3: 
(i) G is self-adjoint. 
(ii) G is positive. 
(iii) GW + W'G is negative. 


To show (1), we note that the adjoint of a mapping defined by an integral of the form 
(13) is 


[a Oar, 


H 


It follows that if the integrand A(z) is self-adjoint for each value of r, then so is the 
integral (13). Since the integrand in (12) 1s self-adjoint, so is the integral G, as 
asserted in (1). 

To show (11), we make use of the observation that for an integral of form (13) and 
for any vector h, 


h 


(h, [Aan = ja A(t)h) dt. 


£l 


It follows that if the integrand A(7) is self-adjoint and positive, so 1s the integral (13). 
Since the integrand in (12) is self-adjoint and positive, 


(h, ee'h) = (eh, e™'h) = ||e™h ||? > 0, 
so is the integral G, as asserted in (11). To prove (iii), we apply the factors W and W” 
under the integral sign: 


GW + W'G = | (eV e "Ww T rt) dt. (15) 
H 


Next we observe that the integrand on the right is a derivative: 


| ow | 
Rm, oW (16) 


dt 


To see this, we use the rule for differentiating the product of eW" and eW": 


5, a d. am | 
W'r Wr Wer \ Wr 
Lg" (L 17 
d dt (£e Je i) 
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We combine this with the rule for differentiating exponential functions (see 
Theorem 5 of Chapter 9): 


d d yp - " 
LeV — etw. aap bus W'tu* — wre 5 
dlt di 


Setting these into (17) shows that (16) is indeed the integrand in (15): 
[d 
GW + W*G = | “et te V qu. (15y' 
0 


We apply now the fundamental theorem of calculus to evaluate the integral on the 
right of (15)' as 


V*'r Wel” 
e Pu mc 


Thus we have 


GW + W°G = -I, 


a negative self-adjoint mapping, as claimed in (111). This completes the proof of 
Theorem 3. [| 

The proof, and therefore the theorem, holds in infinite-dimensional Euclidean 
spaces. 


APPENDIX 15 


The Jordan Canonical Form 


In this appendix we present a proof of the converse part of Theorem 12 in Chapter 6: 


Theorem 1. Let A and B be a pair of n x n matrices with the following 
properties. 


(i) A and B have the same eigenvalues c,....c;. 
(ii) The dimension of the nullspaces 


N,(cj) = nullspace of (A — cI)” 


and 


Hi 


M,(c;) = nullspace of (B — cj) 
are equal for all c; and all zz: 
dim N,(cj) = dim M,,,(¢;). (1) 
Then A and B are similar. 


Proof. In Theorem 12 of Chapter 6, we have shown that these conditions are 
necessary for A and B to be similar. We show now that they are sufficient by 
introducing a special basis in which the action of A is particularly simple and 
depends only on the eigenvalues c; and the dimensions (1). We shall deal with each 
eigenvalue c; separately; for simplicity we take c — c; to be zero. This can be 
accomplished by subtracting cl from A; at the end we shall add cl back. 
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The nullspaces of A" are nested: 
N,CN»C-:- CN, 


where d is the index of the eigenvalue c = 0. 


Lemma 2. A maps the quotient space N;. ; /N; into N;/N; ;, and this mapping 
is one-to-one. 


Proof. A maps N;,; into Nj; therefore A maps N;,, /N; into N;/N;—1. Let {x} be 
a nonzero equivalence class in N;,;/Nj;; this means that no x belongs to N;. It 
follows that Ax does not belong to N;_ |; this shows that A{x} = {Ax} is not the zero 
class in N;/N;..,. This proves Lemma 2. LJ 


It follows from Lemma 2 that 


dim (Nj. i /N;) € dim (N;/N;. |). (2) 
The special basis for A in Ny will be introduced in batches. The first batch, 


X] Ahs ly = dim (N4/N4 1). (3) 


are any lo vectors in Ny that are linearly independent mod N,—-;. The next batch 
is 


Axis sis A: (4) 
these belong to N,_ ;. and are linearly independent mod N,_ >. According to (2). with 


i — d — 1, dim (Nj i /N4.5) > dim (N;/N,4. 1) = lo. We choose the next batch of 
basis vectors in Nv ;, 


Mijtlae ee MH (5) 


where /; = dim (Nj , /N, 5). to complete the vectors (4) to a basis of Nj 4 /Nyg_2. 
The next batch ts 


A*xi... LATER AD asso (6) 


The next batch, 


Xll 94h (7) 


Index 


pealystochastic, til 
positive definite, 79 
positive semidefinite, TH 
Hermitian, «1 
meas 
Borel probability, 22. 144 
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norm, 119, 216 
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parallel subspace, 42 
permutohedron, 256 
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dual, 153 
primal, 163 
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dual, 163 
primal, 163 
point, 1, 5 
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polar, 143, 156 
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rational, 440 
transportation. G1 
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palvtopal complex, 261 
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24-cell, 147 
Birkhoff, 57, 148 
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cyclic, 262 


integer, X) 
multiindex transportation, M 
pennutation, Dt 
self-dual, 147 
simple, id 
simplicial, 264 
totally unimodular, 353 
Lransportation, 61 
Traveling Salesman, 213 
principle 
discretisation, 1:5 
maximum, 189 
problem 
Assignment, 58 
eyeloheptane, 04 
eyelohexane, 8 
Diet, 55, 175 
Masa- Tranafer, 1926 
mun-cost, PUJ 
of linear programming, 45, 128, 163 
dual. 163 
in the canonical form, lit 
in the standard form, Ha 
primal, bi 
af uniform (Chebyshev) approximation, 24, 
15H 
Transportation, 64, 176 
Waring's, 15 
projection. 44 
projective plane, 145 


randomized rounding, 85 

ray. t" 

reciprocity relation, 329. EH, 351 
ridge, 252 


scalar product, 2 
in the space of polynomials, 16 
scaling, &, 111 
aemilelinie programming, 179 
semigroup, 282, EHL 337 
ant 
balanced, 111 
closed, 1160 
compact. 114 
extreme, 121 
open, [05 
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dimensional, 4 
standard, 9 
zpr 
R=, 155 
Rao. AT, 49, 108, 117, 155 
dual. 115 
Euclidean, 1 
subgroup of, 275 
normed, 119 
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Recall that in order to simplify the presentation of the special basis we have 
replaced A by A — cl. Putting back what we have subtracted leads to the following 
matrix form of the action of A: 


CE uo 0 
0 ... C 
that is, each entry along the main diagonal is the eigenvalue c, | — s along the 


superdiagonal directly above it, and zeros everywhere else. A matrix of this form is 
called a Jordan block; when all Jordan blocks are put together, the resulting matrix is 
called a Jordan representation of the mapping A. 

The Jordan representation of A depends only on the eigenvalues of A and the 
dimension of the generalized eigenspaces N;(a,), j = l...., dg, k = 1,.... There- 
fore two matrices that have the same eigenvalues and the same-dimensional 
eigenspaces and generalized eigenspaces have the same Jordan representation. This 
shows that they are similar. Q 


APPENDIX 16 


Numerical Range 


Let X be a Euclidean space over the complex numbers, and let A be a mapping of X 
into X. 


Definition. The numerical range of A is the set of complex numbers 


(Ax, x), | 


x||=1. 
Note that the eigenvalues of A belong to its numerical range. 


Definition. The numerical radius w(A) of A is the supremum of the absolute 
values in the numerical range of A: 


w(A) = E [( Ax, x)|. (1) 


Since the eigenvalues of A belong to its numerical range, the numerical radius of A 
is > its spectral radius: 


r(A) € w(A). (2) 
EXERCISE I. Show that for A normal, equality holds in (2). 
EXERCISE 2. Show that for A normal, 


w(A) = || A ||. (3) 
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Lemma 1. (i) w(A) € |[A |l. 


(ii) || A|| € 2w(A). (4) 


Proof. By the Schwarz inequality we have 
(Ax, x)| € ILAxIEULxHES IAI]; 


since || x || = 1, part (i) follows. 
(ii) Decompose A into its self-adjoint and anti-self-adjoint parts: 


À = $ + iT. 

Then 

(Ax, x) = (Sx, x) + i(Tx, x) 
splits (Ax, x) into its real and imaginary parts; therefore 

(Ax, x)| > (Sx, x), |( Ax, x)| > (Tx, x). 

Taking the supremum over all unit vectors x gives 

w(A) > w(S), w(A) > w(T). (5) 
Since S and T are self-adjoint, 


w(S) — 


IS ||, w(T) = I[T I. 
Adding the two inequalities (5), we get 
2w(A) > [[S]| + IIT |I. 
Since || A|| € || S|] + || T ||, (4) follows. [] 
Paul Halmos conjectured the following: 
Theorem 2. For À as above, 
w(A") < w(A)" (6) 


for every positive integer n. 
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The first proof of this result was given by Charles Berger. The remarkable simple 
proof presented here is Car] Pearcy's. 


Lemma 3. Denote by rj. k = 1,...,n, the nth roots of unity: rj, = e77/". For 
all complex numbers z, 


i-2-[[-223. (7) 
k 


and 


=E [Ia- aa. (8) 


EXERCISE 3. Verify (7) and (8). 
Set A in place of z, we get 


[A -[[ü-nA) (9) 


and 


I--37 T - rA). (10) 


i kéj 
Let x be any unit vector, || x|| = 1; denote 
] [ (1 - nA = x. (11) 
k*j 


Letting (9) act on x and using (11), we get 
x — A"x = (I — rjA)x;, j^1l,...,n. (12) 


From (10) acting on x, we get 


l 
x=-) x. 3 
Y 25 (13) 


Take the scalar product of (12) with x; since || x|| = 1, we get on the left 


| 
| — (A"x, x) = (x — Atx, x) =- (x - A" Y a); 14 
LA X. XS (x x, X) no n x): (14), 
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in the last step we have used (13). Next use (12) on the right: 


| | E 
-2 (xj rx. x) = - (X || x; ll^ - sen) (14), 
j J 
By the definition of w(A), 


(Axx) € w(A)]| 25 I[. (15) 


Suppose that w(A) < 1. Then, since |r;| = 1, it follows from (15) that the real part of 
(14). is nonnegative. Since (14). equals (14),, it follows that the real part of 
| — (A"x. x) is nonnegative. 

Let w be any complex number, [œ| = 1. From the definition of numerical radius, it 
follows that wu{wA) = w(A): therefore, by the above argument, if w(A) < 1, we 
obtain 


| — Re(o" A"x, x) = 1 — Reo" (A"x, x) > 0. 
Since this holds for all w. |@| = 1, it follows that 
I(A"x, x)| € 1 


for all unit vectors x. It follows that w(A") < 1 if w(A) < 1. 
Since w(A) is a homogeneous function of degree 1, 


w(zA) = |z|w( A), 
and conclusion (6) of Theorem 2 follows. a 
Combining Theorem 2 with Lemma 1, we obtain the following: 


Corollary 2. Let A denote an operator as above for which w(A) < 1. Then for 
all n, we have 


|| A" || € 2. (16) 


This corollary is useful for studying the stability of difference approximations of 
hyperbolic equations. 

Note 1. The proof of Theorem 2 nowhere makes use of the finite dimensionality 
of the Euclidean space X. 

Note 2. Toeplitz and Hausdorff have proved that the numerical range of every 
mapping A is a convex subset of the complex plane. 


EXERCISE 4. Determine the numerical range of A = (} |) and of A? = (} 4). 
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Avoidance of crossing, 141 rate of, 252, 254, 260 
Convex set, 187 
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ordered, 326 gauge function, 188 
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Coset see Quotient space 
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sequence, 92 Determinant, 45, 65 
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Cayley's Theorem, 305 Cramer's rule, 54 
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Characterstic value see Eigenvalue Laplace expansion, 52 
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Domain space, 19 
Dot product. see Scalar product 
Doubly stochastic matrix, 197 
Dual: 

lattice, 318 

norm, 221, 23] 

space, 14 
Duality theorem, 206 

economics interpretation of, 208 


Eigenvalue, 59, 60, 262 
of anti-selfadjoint map, 112 
index, 72 
multiple, 68 
of selfadjoint map. 106 
simple, 129 
smallest largest, 113, 114, 166 
of unitary map, 113 
variational characterization, 113 
Eigenvector, 59, 60 
expension, 64 
generalized, 69 
Energy see vibration 
Error, 248 
Euclidean structure, 79 
complex, 95 
Euler's theorem, 172 
Exponential function, 111, 127, 128 


Farkas-Minkowski theorem, 203 
Fast matrix multiplication, 320 
Fast Fourier transform, 3525 
Fibonacci sequence, 62 
Finite Fourier transform, 119, 328 
Fischer, 116 
Flaschka, 269 
Fluid flow, 177 
curl, 179 
divergence, 180 
Francis, 263 
Frequency, 180 
Friedland, Robbin and Sylvester, 327 
Frobenius theorem, 243 
Function of selfadjoint mappings, 111 
analytic. 140 
monotone, 151 
Fundamental theorem, 20 


Game theory, 209 
min max theorem, 210 
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Gauge function, 188 
Gauss, 246 

Gaussian elimination, 39 
Gershgorin's theorem, 323 
Givens, 246, 269 
Google, 242 
Gram matrix, 152, 316 
Gram-Schmidt procedure, 81 


Hadamard, 157 
Hahn-Banach theorem, 191 
complex version, 226 
Halmos-Berger-Pearcy, 369 
Hamiltonian equation, 312 

Helly's theorem, 199 
Herglotz, 152 
Hermitean symmetric, 101 
Hestenes, 246 
Hilbert-Schmidt norm, 99 
Holder inequality, 215 
Householder, 246, 266 
Hyperplane, 187 

separation theorem. 190, 192 


Inner product, see Scalar product 

Interpolation, 22 

Inverse, 25 

Isometry, See also orthogonal 

and unitary matrix, 57 

Isomorphism, 3 

Iteration: 
Chebyshev, 252, 253 
conjugate gradient, 256 
iterative methods, 248 
steepest descent, 249 


Jacobi, 246 
Jacobian, 177 
Jordan form xii, 366 


Konig-Birkhoff theorem, 195 
Krein-Milman theorem, 199 


Lanczos, 246 
Lattice, 317 
dust, 318 
integer basis, 318 
Law of Inertia, 105 
Lax, 325 
Liapunov's Theorem, 358 
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Linear: of game theory, 210 
bilinear, 79 Monotonic matrix function, 151 
combination, 4 Minkowski metric, 342, 349 
dependence, 5 Moser's Theorem, 273 
function, 13 Muraki, 142 
independence, 5 
operator, 20 Nanda, 269 
space, 1 Non-Euclidean geometry, 347 
subspace, 4 Norm(s), 89 
system of equations, 39 equivalent, 217 
transformation, 20 Euclidean, 79 

Linear mapping(s), 19 dual, 90 
algebra of, 24 of mapping. 230 
invertible, 25, 233 of matrix, 59 
norm of, 89, 230 Normal mode, 184 
transpose of, 26 Normed linear space, 214 

Loewner, 151 complex, 225 

Lorentz Nullspace, 20 
group, 343 Numerical 
transformation, 342 radius, 367 

range, 367 

Matrix, 33 
anti-selfadjoint, 112 Orientation, 44 
block. 38 Orthogonal, 80 
column rank, 37 complement, 82 
diagonal, 38 group, 89 
Gram, 152, 316 matrix, 89, 108 
Hermitean, 101 projection, 85 
Hessian, 102 Orthonormal basis, 80 
identity, 38 
Jacobian, 177 Parallelogram law, 227 
multiplication of, 34, 35, 320 Permutation, 46 
normal, 112 eroup, 34, 46 
orthogonal, 89 matrix, 197 
positive, 237 signature of, 47 
positive selfadjoint, Perron's theorem, 237 

117, 143 Pfaffian, 305 
row rank, 37 Polar decomposition, 169 
selfadjoint, 101 Polynomial, 
similar, 38 characteristic, 61, 63 
symmetric, 101 minimal, 72 
symplectic, 309 Population evolution, 242 
transpose, 36 Positive definite, see Positive 
tridiagonal, 38 selfadjoint 
unimodular, 318 Positive selfadjoint mapping, 117, 143 
unitary, 96 Principal minor, 131, 162 
valued function, 122 Projection, 30 

Minimal polynomial, 72. 73 orthogonal, 110 


Minmax principle, 116, 118 Phythagorean theorem, 80 
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QR algorithm, 2635 
Quadratic form, 102 
diagonal form, 103 
Quadrature formula, 17 
Quotient space, 8 
normed, 225 


Random, 200 
Range, 20 
Rank see row and column rank 
Rayleigh quotient, 114 
Reflection, 89, 226 
Rellich's theorem, 140, 353 
Residual, 249 
Resolution of the identity, 110 
Riesz, F, 152, 220 
Rigid motion, 172 
angular velocity vector, 176 
infinitesimal generator, 174 
Rotation, 172 
Rounding, 247 


Scalar, 2 

Scalar product, 77, 79 

Schur factorization, 335 

Schur's theorem, 153, 316 

Schwarz inequality, 79 

selfadjoint, 106 
anti-selfadjoint, 112 
part, 112 

Similarity, 29, 55 

Simplex, ordered, 44 

Singular value, 170 


Solving systems of linear equations, 


Chebyshev iteration, 252 
optimal three-term iteration, 256 
steepest descent, 249 
three-term Chebyshev 
iteration, 255 
Gaussian elimination, 39 
Spectral Theorem, 70 
of commuting maps, 74 
mapping theorem, 66 
redius, 97, 334 
resolution, 110 
of selfadjoint maps, 106 
Spectrum of a matrix: 
anti-selfadjoint, 112 
orthogonal, 86, 113 


selfadjoint, 106 
symplectic, 309 
unitary, 113 
Square root of positive matrix, 145 
Steepest descent see iteration 
Stiefel, 246 
Stochastic matrix, 240 
Strassen, 320 
Subspace, 4 
distance from, 84, 223 
Support function, 193 
Symmetrized product, 148 
Symplectic matrix, 309 
group, 309 


Target space, 19 
Tensor product, 313 
Toda flow, 269 
Tomei, 269 
Trace, 55, 65 
Transpose: 
of linear map, 26 
of matrix, 36 
Triangle inequality, 80, 215 
Tukey, 333 


Underdetermined systems of linear 
equation, 21 

Unitary: 
map, 96 
eroup, 96 


Vandermonde matrix, 302 
determinant, 302 
Vector, | 
norm of, 79, 214 
valued function, 121 
Vector space see Linear space 
Viberation, 180 
amplitude, 180, 182 
energy. 180, 182 
frequency of, 180 
phase, 180 
Volume, 44 
Signed, 45 
von Neumann, 213, 246 
and Wigner, 141 


Wieland-Hoffman theorem, 164 
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practice of linear spaces and linear maps with a unique focus on the analytical aspects as well as 
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friendly additions that enhance the books accessibility, including expanded topical coverage in 
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* The QR algorithm for finding the eigenvalues of a self-adjoint matrix 
* The Householder algorithm for turning self-adjoint matrices into tridiagonal form 
* The compactness of the unit ball as a criterion of finite dimensionality of a normed linear space 
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Transform; the spectral radius theorem; the Lorentz group; the compactness criterion for finite 
dimensionality; the characterization of commentators; proof of Liapunovs stability criterion; the 
construction of the Jordan Canonical form of matrices; and Carl Pearcy's elegant proof of Halmos 
conjecture about the numerical range of matrices. 
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